Google DeepMind's India unit made significant contributions to Gemini 2.5 Flash, an artificial intelligence model designed for fast, low-latency applications that was recently launched in preview, senior director Manish Gupta, who is senior director at Alphabet Inc’s AI subsidiary, told Moneycontrol.
"The Flash model is the low latency, cost- efficient version, it is going to come out soon. And this one I feel very proud of, because my own team has been contributing a lot to this Flash model. A lot of the efficiency improvements have been coming from teams like ours,” the Google DeepMind senior director said in an interview.
Google rolled out an early version of Gemini 2.5 Flash in preview for developers in April and it will be available via Gemini API in Google AI Studio and Vertex AI.
Gemini 2.5 Flash is the company's first fully hybrid reasoning model that gives developers the ability to turn thinking on or off. It also allows them to set “thinking budgets” to control how much the model reasons, helping them optimise across quality, cost, and latency.
The model builds on Gemini 2.0 Flash, which was introduced in December 2024 and made available in February.
Gupta said Gemini 2.0 Flash model gives 5X higher performance per dollar than DeepSeek’s R1.
“DeepSeek is seen as a company that brought a paradigm shift in cost efficiency, that's not quite true. It was very impressive no doubt... People think DeepSeek came up with some amazing breakthrough on efficiency. Flash is actually more efficient.” Gupta said.
Gupta’s comments come after Alphabet CEO Sundar Pichai during the company's Q4 earnings call in February said the search giant's Gemini Flash 2.0 and Flash Thinking 2.0 models were “some of the most efficient models” available, outperforming even DeepSeek's R1 and V3.
Pichai's remarks came after the Chinese AI lab DeepSeek triggered a Wall Street meltdown in late January with its claims of developing a model those rivalling top-tier models from American companies such as OpenAI, Meta and Google at a fraction of the cost.
The claims made investors question the billions of dollars being poured in by tech companies to develop their AI models and products.
Developing LLM benchmarks
Gupta’s team in India has also been central to contributing to language technologies, especially for Gemini 2.0, where the team evaluated a set of 29 Indian languages through a benchmark.
IndicGenBench, released in 2024, is an open benchmark aiming to bridge the gap between the capabilities of generative AI models on English and those on Indian languages.
Google is also developing benchmarks with its partners to assess the quality and performance capabilities of the large language models (LLMs) the company has been building, Gupta said.
Apart from IndicGenBench, the tech giant recently released QuestBench benchmark to determine if LLMs can pinpoint the single, crucial question needed to solve logic, planning, or math problems.
Google DeepMind is also in talks with many partners for creating similar LLM benchmarks for other use cases, Gupta said.
Google Cloud's AI enhancements
Gupta shared that Google Cloud has released more than 3,000 product advances in the last one year and its Infrastructure has expanded to 42 regions, across two million miles of cables.
“We've had over 4 million developers now using Gemini. Last year, we've seen 20X increase in Vertex AI usage driven by these core models such as Gemini,” he said.
Gupta also cited examples of companies like Kraft Heinz that accelerated its campaign creation from weeks to hours using Gemini.
In April, Google also launched its seventh generation tensor-processing unit (TPU) ‘Ironwood’ to speed up AI workloads. The newest TPU can deliver 3,600X better performance and is 15X more energy efficient than earlier TPUs.
It delivers 36.7 exaflops of compute per pod, as compared to the world’s fastest super computer offers 1.7 exaflops of performance.
An exaflop is a measure of performance of a supercomputer that can calculate at least one quintillion floating point operations per second.
“The rest of the world talks about GPUs as the only way for speeding up AI workloads. At least in our team, we use GPUs when we can't get hold of TPUs because TPUs are extremely powerful,” Gupta said.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
