Google expands Gemini 3 lineup with budget-focused 3.1 Flash-Lite model

Google has unveiled Gemini 3.1 Flash-Lite, its fastest and most cost-efficient Gemini 3 model yet, targeting developers who need high-volume AI workloads delivered quickly, reliably and at significantly lower cost.

Sarthak Singh

March 05, 2026 / 13:09 IST

Gemini 3.1 Flash Lite

Google unveils Gemini 3.1 Flash-Lite, its fastest, cheapest AI model
Flash-Lite costs $0.25 per million input tokens, $1.50 for output
Model offers 2.5x faster response and adjustable reasoning levels

Did our AI summary help?

Google has introduced Gemini 3.1 Flash-Lite, positioning it as the fastest and most affordable model in its Gemini 3 family. The new model is rolling out in preview to developers through the Gemini API in Google AI Studio and to enterprise customers via Vertex AI.

Gemini 3.1 Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens, making it one of the most aggressively priced models in its class. The model is aimed squarely at high-frequency developer workloads such as translation, content moderation and real-time user interactions where latency and cost control are critical.

According to Artificial Analysis benchmarks, Flash-Lite delivers a 2.5 times faster Time to First Answer Token and a 45 percent increase in output speed compared to Gemini 2.5 Flash, while maintaining similar or better quality. That performance improvement is particularly important for live chat systems, interactive dashboards and other responsive applications where delays directly affect user experience.

Despite being positioned as a “Lite” tier model, Google says Flash-Lite achieves an Elo score of 1432 on the Arena.ai Leaderboard and posts strong results across reasoning and multimodal benchmarks, including 86.9 percent on GPQA Diamond and 76.8 percent on MMMU Pro. The company claims the model even surpasses some larger Gemini models from prior generations, including Gemini 2.5 Flash, in several benchmark categories.

Beyond raw performance, Gemini 3.1 Flash-Lite includes adjustable “thinking levels” within AI Studio and Vertex AI. Developers can control how much reasoning depth the model applies to a task, allowing them to manage cost and performance depending on workload requirements. For repetitive, high-volume tasks, teams can reduce reasoning intensity to optimise efficiency. For more complex workflows such as generating user interfaces, building dashboards, creating simulations or following detailed multi-step instructions, they can allocate more computational depth.

Google says early-access developers using AI Studio and Vertex AI, including companies such as Latitude, Cartwheel and Whering, are already deploying Flash-Lite at scale. Early testers have reportedly highlighted its efficiency and reasoning capabilities, noting that it can handle complex inputs with the precision of a larger-tier model while maintaining strong instruction adherence.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Sarthak Singh Sarthak is an experienced writer having covered personal and consumer tech, gadgets news, social media trends, and more for several years

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Email address *
Subscribe

Advisory Alert:

It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347

Google expands Gemini 3 lineup with budget-focused 3.1 Flash-Lite model

Related Stories

Subscribe to Tech Newsletters

Trending news

Advisory Alert: