Google Gemini 3.0 announced: Here’s how it compares to Gemini 2.5 Pro and GPT-5.1

Google has introduced Gemini 3.0, its latest AI model designed to push scientific reasoning, math capability, and multimodal understanding forward. Here is how it compares against Gemini 2.5 Pro and GPT-5.1.

MC Tech Desk

November 18, 2025 / 22:10 IST

Gemini 3 vs Gemini 2.5 Pro vs GPT 5.1

Google has officially announced Gemini 3.0, the newest flagship model in its generative AI lineup, focusing heavily on scientific reasoning, mathematics, multimodal understanding and long-horizon agentic tasks. Early benchmark data shows that Gemini 3.0 improves significantly over Gemini 2.5 Pro and posts competitive, often superior, performance against GPT-5.1 across several categories.

Gemini 3.0’s strongest gains are visible in tasks involving scientific knowledge (GPQA Diamond), math competitions, agentic tool use, coding benchmarks, and long-horizon decision-making. The model also delivers major improvements in multimodal reasoning benchmarks such as MMMU-Pro and ScreenSpot-Pro, marking a step-change for interactive and real-time use cases.

Benchmark comparison table: Gemini 3.0 vs Gemini 2.5 Pro vs GPT-5.1

Benchmark	Description	Gemini 3.0	Gemini 2.5 Pro	GPT-5.1
Humanity’s Last Exam	Academic reasoning	37.5% (no tools) / 45.8% (search + execution)	21.6%	26.5%
ARC-AGI-2	Visual reasoning puzzles	31.1%	4.9%	17.6%
GPQA Diamond	Scientific knowledge	91.9%	86.4%	88.1%
AIME 2025	Mathematics	95.0%	88.0%	94.0%
MathArena Apex	Math contest problems	23.4%	0.5%	1.0%
MMMU-Pro	Multimodal reasoning	81.0%	68.0%	76.0%
ScreenSpot-Pro	Screen understanding	72.7%	11.4%	3.5%
CharXiv Reasoning	Chart/complex info synthesis	81.4%	69.6%	69.5%
OmniDocBench 1.5	OCR (lower is better)	0.115	0.145	0.147
Video-MMMU	Video knowledge	87.6%	83.6%	80.4%
LiveCodeBench Pro	Competitive coding	2,439	1,775	2,243
Terminal-Bench 2.0	Agentic terminal coding	54.2%	32.6%	47.6%
SWE-Bench Verified	Agentic coding	76.2%	59.6%	76.3%
t2-bench	Agentic tool use	85.4%	54.9%	80.2%
Vending-Bench 2	Long-horizon decision tasks	$5,478.16	$573.64	$1,473.43
FACTS Benchmark Suite	Grounding + parametric reasoning	70.5%	63.4%	50.8%
SimpleQA Verified	Parametric knowledge	72.1%	54.5%	34.9%
MMLU	Multilingual Q&A	91.8%	89.5%	91.0%
Global PIQA	Commonsense reasoning	93.4%	91.5%	90.9%
MRCR V2 (8-needle)	Long-context performance	77.0%	58.0%	61.6%
MRCR V2 (1M pointwise)	Long-context	26.3%	16.4%	not supported

Benchmark

Description

Gemini 3.0

Gemini 2.5 Pro

GPT-5.1

Humanity’s Last Exam

Academic reasoning

37.5% (no tools) / 45.8% (search + execution)

21.6%

26.5%

ARC-AGI-2

Visual reasoning puzzles

31.1%

4.9%

17.6%

GPQA Diamond

Scientific knowledge

91.9%

86.4%

88.1%

AIME 2025

Mathematics

95.0%

88.0%

94.0%

MathArena Apex

Math contest problems

23.4%

0.5%

1.0%

MMMU-Pro

Multimodal reasoning

81.0%

68.0%

76.0%

ScreenSpot-Pro

Screen understanding

72.7%

11.4%

3.5%

CharXiv Reasoning

Chart/complex info synthesis

81.4%

69.6%

69.5%

OmniDocBench 1.5

OCR (lower is better)

0.115

0.145

0.147

Video-MMMU

Video knowledge

87.6%

83.6%

80.4%

LiveCodeBench Pro

Competitive coding

2,439

1,775

2,243

Terminal-Bench 2.0

Agentic terminal coding

54.2%

32.6%

47.6%

SWE-Bench Verified

Agentic coding

76.2%

59.6%

76.3%

t2-bench

Agentic tool use

85.4%

54.9%

80.2%

Vending-Bench 2

Long-horizon decision tasks

$5,478.16

$573.64

$1,473.43

FACTS Benchmark Suite

Grounding + parametric reasoning

70.5%

63.4%

50.8%

SimpleQA Verified

Parametric knowledge

72.1%

54.5%

34.9%

MMLU

Multilingual Q&A

91.8%

89.5%

91.0%

Global PIQA

Commonsense reasoning

93.4%

91.5%

90.9%

MRCR V2 (8-needle)

Long-context performance

77.0%

58.0%

61.6%

MRCR V2 (1M pointwise)

Long-context

26.3%

16.4%

not supported

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Email address *
Subscribe

Advisory Alert:

It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347

Google Gemini 3.0 announced: Here’s how it compares to Gemini 2.5 Pro and GPT-5.1

Subscribe to Tech Newsletters

Trending news

Advisory Alert: