
Google has introduced Gemini 3.1 Flash-Lite, positioning it as the fastest and most affordable model in its Gemini 3 family. The new model is rolling out in preview to developers through the Gemini API in Google AI Studio and to enterprise customers via Vertex AI.
Gemini 3.1 Flash-Lite is priced at $0.25 per million input tokens and $1.50 per million output tokens, making it one of the most aggressively priced models in its class. The model is aimed squarely at high-frequency developer workloads such as translation, content moderation and real-time user interactions where latency and cost control are critical.
According to Artificial Analysis benchmarks, Flash-Lite delivers a 2.5 times faster Time to First Answer Token and a 45 percent increase in output speed compared to Gemini 2.5 Flash, while maintaining similar or better quality. That performance improvement is particularly important for live chat systems, interactive dashboards and other responsive applications where delays directly affect user experience.
Despite being positioned as a “Lite” tier model, Google says Flash-Lite achieves an Elo score of 1432 on the Arena.ai Leaderboard and posts strong results across reasoning and multimodal benchmarks, including 86.9 percent on GPQA Diamond and 76.8 percent on MMMU Pro. The company claims the model even surpasses some larger Gemini models from prior generations, including Gemini 2.5 Flash, in several benchmark categories.
Beyond raw performance, Gemini 3.1 Flash-Lite includes adjustable “thinking levels” within AI Studio and Vertex AI. Developers can control how much reasoning depth the model applies to a task, allowing them to manage cost and performance depending on workload requirements. For repetitive, high-volume tasks, teams can reduce reasoning intensity to optimise efficiency. For more complex workflows such as generating user interfaces, building dashboards, creating simulations or following detailed multi-step instructions, they can allocate more computational depth.
Google says early-access developers using AI Studio and Vertex AI, including companies such as Latitude, Cartwheel and Whering, are already deploying Flash-Lite at scale. Early testers have reportedly highlighted its efficiency and reasoning capabilities, noting that it can handle complex inputs with the precision of a larger-tier model while maintaining strong instruction adherence.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.