HomeTechnologyGoogle’s Veo 3 escalates the AI video race with native audio generation

Google’s Veo 3 escalates the AI video race with native audio generation

Google is also launching a new AI filmmaking tool called Flow, and integrating the Veo 3 and Imagen 4 AI models into its Gemini assistant app

Alongside Veo 3, Google has introduced a new version of its image generation model Imagen, called Imagen 4, which can create images in a range of aspect ratios and up to 2k resolution

Google is introducing audio generation to its text-to-video generation artificial intelligence (AI) model, Veo, that aims to compete with OpenAI's Sora, Meta's Movie Gen, and startups like Runway, and Stability AI.

On May 20, Google unveiled Veo 3, the latest version of its video generation model, at its annual developer conference, Google I/O 2025.

Story continues below Advertisement

Remove Ad

The new model, which succeeds Veo 2, can generate sound effects and background noise like traffic noises in the background of a city street scene, birds singing in a park, or dialogue between characters from a text prompt.

"We're emerging from the silent era of video generation...This opens up a whole new world of possibilities," said Google DeepMind CEO Demis Hassabis.

Veo 3 is available today for Gemini Ultra subscribers in the United States in the Gemini app and in Flow. It will also be available for enterprise users on Vertex AI.

Story continues below Advertisement

Remove Ad

Veo 2 is also getting new capabilities like the ability to add or remove objects from videos, broaden the frame, turn the video from portrait to landscape, and define precise camera movements

Gemini 2.5 Flash is the company's first fully hybrid reasoning model that gives developers the ability to turn thinking or reasoning on or off. It also allows them to set “thinking budgets” to control how much the model reasons, helping them optimise across quality, cost, and latency. The model builds on Gemini 2.0 Flash, which was introduced in December 2024 and made available in February.

"(Gemini 2.5) Flash has been incredibly popular with developers who love its speed and low cost. The new 2.5 Flash is better in nearly every dimension, improving across key benchmarks for reasoning, code, and long context" Hassabis said.

Meanwhile, Google released an early experimental version of Gemini 2.5 Pro in March 2025, featuring enhanced reasoning and advanced code capabilities. The model was also made available to all users through the Gemini app.

Earlier this month, the company rolled out an updated version, dubbed Gemini 2.5 Pro Preview (I/O edition), with significantly improved coding capabilities. This includes fundamental coding tasks such as transforming and editing code, meaningful improvements for front-end and UI development, and creating sophisticated agentic workflows.

Google stated that it is also bringing computer use capabilities from Project Mariner, a research prototype exploring human-agent interactions, into the Gemini API and Vertex AI.

Companies like Automation Anywhere, UiPath, Browserbase, Autotab, The Interaction Company and Cartwheel are currently evaluating its capabilities and the company hopes to roll it out more broadly for developers to experiment later this year.

Download MC Apps:

Copyright © Network18 Media & Investments Limited. All rights reserved. Reproduction of news articles, photos, videos or any other content in whole or in part in any form or medium without express written permission of moneycontrol.com is prohibited.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

Google’s Veo 3 escalates the AI video race with native audio generation

Google is also launching a new AI filmmaking tool called Flow, and integrating the Veo 3 and Imagen 4 AI models into its Gemini assistant app

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links