Google today released Gemini 3.1 Pro. Just saw the test scores and feel this is aimed at dominating the leaderboard (model arms race continues, benefiting the semiconductor industry!)😂
The official positioning is very clear: designed specifically for complex tasks such as in-depth research, engineering challenges, long-chain reasoning, and agentic workflows. Key highlights: 1M token context window (unchanged) Multimodal support (text + images + video + audio + code) Output up to 64k tokens Performance comparison with current mainstream models (Claude Opus 4.6, GPT-5.2/5.3, etc.): ARC-AGI-2 (the most difficult abstract reasoning benchmark): Gemini 3.1 Pro 77.1%, approximately 8-9 percentage points ahead of Claude 4.6 (68.8%), and 20-30+ percentage points ahead of GPT-5 series. This is the biggest leap, representing a qualitative breakthrough in core reasoning. GPQA Diamond (PhD-level scientific reasoning): 94.3%, slightly ahead of Claude 4.6 (91.3%) and GPT-5.2 (92.4%), with a gap of 2-3 percentage points, nearing saturation. SWE-Bench Verified (real software engineering tasks): 80.6%, about 3-5 percentage points ahead of Claude 4.6 (around 76-77%), and significantly ahead of GPT (5-15%). Others: Achieved top positions in long-term agent tasks such as Terminal-Bench, APEX-Agents; LMArena/Artificial Analysis index currently ranked first, with high cost efficiency. More importantly, the cost advantage is obvious: API pricing (per 1M tokens, based on latest Vertex AI / Gemini API data, standard price for ≤200k context): Gemini 3.1 Pro: input $2.00, output $12.00 (doubling to $4/$18 for >200k context) Claude Opus 4.6: input $5.00, output $25.00 GPT-5.2 / 5.x: typically $10–15+ for input, $30–75+ for output (higher tiers vary by version) Advantage margin: Input: Gemini is about 60% cheaper than Claude (2 vs 5), and over 70–80% cheaper than GPT series. Output: Gemini is about 52% cheaper than Claude (12 vs 25), and over 60–80% cheaper than GPT.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Google today released Gemini 3.1 Pro. Just saw the test scores and feel this is aimed at dominating the leaderboard (model arms race continues, benefiting the semiconductor industry!)😂
The official positioning is very clear: designed specifically for complex tasks such as in-depth research, engineering challenges, long-chain reasoning, and agentic workflows.
Key highlights: 1M token context window (unchanged)
Multimodal support (text + images + video + audio + code)
Output up to 64k tokens
Performance comparison with current mainstream models (Claude Opus 4.6, GPT-5.2/5.3, etc.):
ARC-AGI-2 (the most difficult abstract reasoning benchmark):
Gemini 3.1 Pro 77.1%, approximately 8-9 percentage points ahead of Claude 4.6 (68.8%), and 20-30+ percentage points ahead of GPT-5 series. This is the biggest leap, representing a qualitative breakthrough in core reasoning.
GPQA Diamond (PhD-level scientific reasoning): 94.3%, slightly ahead of Claude 4.6 (91.3%) and GPT-5.2 (92.4%), with a gap of 2-3 percentage points, nearing saturation.
SWE-Bench Verified (real software engineering tasks): 80.6%, about 3-5 percentage points ahead of Claude 4.6 (around 76-77%), and significantly ahead of GPT (5-15%).
Others: Achieved top positions in long-term agent tasks such as Terminal-Bench, APEX-Agents; LMArena/Artificial Analysis index currently ranked first, with high cost efficiency.
More importantly, the cost advantage is obvious:
API pricing (per 1M tokens, based on latest Vertex AI / Gemini API data, standard price for ≤200k context):
Gemini 3.1 Pro: input $2.00, output $12.00 (doubling to $4/$18 for >200k context)
Claude Opus 4.6: input $5.00, output $25.00
GPT-5.2 / 5.x: typically $10–15+ for input, $30–75+ for output (higher tiers vary by version)
Advantage margin: Input: Gemini is about 60% cheaper than Claude (2 vs 5), and over 70–80% cheaper than GPT series.
Output: Gemini is about 52% cheaper than Claude (12 vs 25), and over 60–80% cheaper than GPT.