Google Gemini 3.0 announced: Here’s how it compares to Gemini 2.5 Pro and GPT-5.1

Google has officially announced Gemini 3.0, the newest flagship model in its generative AI lineup, focusing heavily on scientific reasoning, mathematics, multimodal understanding and long-horizon agentic tasks. Early benchmark data shows that Gemini 3.0 improves significantly over Gemini 2.5 Pro and posts competitive, often superior, performance against GPT-5.1 across several categories.

Gemini 3.0’s strongest gains are visible in tasks involving scientific knowledge (GPQA Diamond), math competitions, agentic tool use, coding benchmarks, and long-horizon decision-making. The model also delivers major improvements in multimodal reasoning benchmarks such as MMMU-Pro and ScreenSpot-Pro, marking a step-change for interactive and real-time use cases.

Story continues below Advertisement

Remove Ad

Benchmark comparison table: Gemini 3.0 vs Gemini 2.5 Pro vs GPT-5.1

Benchmark	Description	Gemini 3.0	Gemini 2.5 Pro	GPT-5.1
Humanity’s Last Exam	Academic reasoning	37.5% (no tools) / 45.8% (search + execution)	21.6%	26.5%
ARC-AGI-2	Visual reasoning puzzles	31.1%	4.9%	17.6%
GPQA Diamond	Scientific knowledge	91.9%	86.4%	88.1%
AIME 2025	Mathematics	95.0%	88.0%	94.0%
MathArena Apex	Math contest problems	23.4%	0.5%	1.0%
MMMU-Pro	Multimodal reasoning	81.0%	68.0%	76.0%
ScreenSpot-Pro	Screen understanding	72.7%	11.4%	3.5%
CharXiv Reasoning	Chart/complex info synthesis	81.4%	69.6%	69.5%
OmniDocBench 1.5	OCR (lower is better)	0.115	0.145	0.147
Video-MMMU	Video knowledge	87.6%	83.6%	80.4%
LiveCodeBench Pro	Competitive coding	2,439	1,775	2,243
Terminal-Bench 2.0	Agentic terminal coding	54.2%	32.6%	47.6%
SWE-Bench Verified	Agentic coding	76.2%	59.6%	76.3%
t2-bench	Agentic tool use	85.4%	54.9%	80.2%
Vending-Bench 2	Long-horizon decision tasks	$5,478.16	$573.64	$1,473.43
FACTS Benchmark Suite	Grounding + parametric reasoning	70.5%	63.4%	50.8%
SimpleQA Verified	Parametric knowledge	72.1%	54.5%	34.9%
MMLU	Multilingual Q&A	91.8%	89.5%	91.0%
Global PIQA	Commonsense reasoning	93.4%	91.5%	90.9%
MRCR V2 (8-needle)	Long-context performance	77.0%	58.0%	61.6%
MRCR V2 (1M pointwise)	Long-context	26.3%	16.4%	not supported

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.

first published: Nov 18, 2025 10:00 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

Google Gemini 3.0 announced: Here’s how it compares to Gemini 2.5 Pro and GPT-5.1

Google has introduced Gemini 3.0, its latest AI model designed to push scientific reasoning, math capability, and multimodal understanding forward. Here is how it compares against Gemini 2.5 Pro and GPT-5.1.

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links