Google unveils Gemini 2.0 to power the next generation of virtual agents

Google CEO Sundar Pichai said new advances in multimodality — like native image and audio output — and native tool use will enable the tech giant to build AI agents that brings them closer to their vision of a universal assistant

Vikas SN

December 12, 2024 / 00:59 IST

Gemini 2.0 can natively generate images and audio along with text. It can also natively use tools including Google Search and Maps.

Google on December 11 unveiled a new version of its flagship artificial intelligence (AI) model, Gemini, aimed at powering the next generation of virtual agents, as the frenzied race among tech giants to dominate the sector escalates.

Called Gemini 2.0, the AI model can natively generate images and audio along with text. It can also natively use tools including Google Search and Maps.

"If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful," Google CEO Sundar Pichai said in a blogpost.

Google chief said the new advances in multimodality — like native image and audio output — and native tool use will enable the company to "build new AI agents that bring us closer to our vision of a universal assistant".

This launch comes a year after Google first unveiled its Gemini family of AI models, which was built from the ground-up and multimodal in nature. It was the first AI model from the tech giant after the merger of its AI research units, DeepMind and Google Brain, into a single division called Google DeepMind, led by DeepMind CEO Demis Hassabis in April 2023.

In addition to performance improvements, Gemini 2.0 Flash brings in new capabilities, including support for multimodal output such as natively generated images combined with text and customisable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution, and third-party user-defined functions, the company said.

The multimodal input and text output are currently available to all developers. However, the text-to-speech and native image generation capabilities are available only to early-access partners at present, with general availability expected in January, along with additional model sizes.

The firm will also be releasing a new Multimodal Live API that has real-time audio, video-streaming input and the ability to use multiple, combined tools to help developers build dynamic and interactive applications, Hassabis and Google DeepMind CTO Koray Kavukcuoglu stated in a blogpost.

Expanding Gemini 2.0 to Google products

Consumers across the world will also be able to access a chat optimized version of Gemini 2.0 through the Gemini AI chatbot by selecting it from the model dropdown on the desktop, and mobile web, with mobile app support shortly.

Google also debuted a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant, exploring complex topics and compiling reports on the user's behalf. The feature will be available to users of Gemini Advanced, the paid tier of Gemini chatbot.

In addition, Pichai stated that the company is bringing the advanced reasoning capabilities of Gemini 2.0 to AI Overviews, its generative AI search experience, to tackle more complex topics and multi-step questions, including advanced math equations, multimodal queries, and coding.

"We started limited testing this week and will be rolling it out more broadly early next year. And we’ll continue to bring AI Overviews to more countries and languages over the next year" he said. The model will be expanded to other Google products early next year.

Google is also using Gemini 2.0 in new research prototypes such as its futuristic universal AI assistant Project Astra; Project Mariner, an early prototype capable of taking actions in Chrome as an experimental extension; and Jules, an experimental AI-powered code agent.

"We’re still in the early stages of development, but we’re excited to see how trusted testers use these new capabilities and what lessons we can learn, so we can make them more widely available in products in the future" Hassabis and Kavukcuoglu stated in the blogpost.

The executives also stated the company is expanding its trusted tester programme to more people, including a small group that will soon begin testing Project Astra on prototype glasses.

Pichai mentioned that the firm will continue to prioritize safety and responsibility with these projects. "This is why we’re taking an exploratory and gradual approach to development, including working with trusted testers," he said.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Vikas SN covers Big Tech, streaming, social media and gaming industry

first published: Dec 11, 2024 10:15 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Google unveils Gemini 2.0 to power the next generation of virtual agents

Google CEO Sundar Pichai said new advances in multimodality — like native image and audio output — and native tool use will enable the tech giant to build AI agents that brings them closer to their vision of a universal assistant

Related stories

Subscribe to Tech Newsletters

Trending news