Moneycontrol PRO
HomeTechnologyGoogle’s Veo 3 escalates the AI video race with native audio generation

Google’s Veo 3 escalates the AI video race with native audio generation

Google is also launching a new AI filmmaking tool called Flow, and integrating the Veo 3 and Imagen 4 AI models into its Gemini assistant app

May 20, 2025 / 23:18 IST
Alongside Veo 3, Google has introduced a new version of its image generation model Imagen, called Imagen 4, which can create images in a range of aspect ratios and up to 2k resolution

Google is introducing audio generation to its text-to-video generation artificial intelligence (AI) model, Veo, that aims to compete with OpenAI's Sora, Meta's Movie Gen, and startups like Runway, and Stability AI.

On May 20, Google unveiled Veo 3, the latest version of its video generation model, at its annual developer conference, Google I/O 2025.

The new model, which succeeds Veo 2, can generate sound effects and background noise like traffic noises in the background of a city street scene, birds singing in a park, or dialogue between characters from a text prompt.

"We're emerging from the silent era of video generation...This opens up a whole new world of possibilities," said Google DeepMind CEO Demis Hassabis.

Veo 3 is available today for Gemini Ultra subscribers in the United States in the Gemini app and in Flow. It will also be available for enterprise users on Vertex AI.

Veo 2 is also getting new capabilities like the ability to add or remove objects from videos, broaden the frame, turn the video from portrait to landscape, and define precise camera movements

Alongside Veo 3, Google has introduced a new version of its image generation model Imagen, called Imagen 4, which can create images in a range of aspect ratios and up to 2k resolution.

Imagen 4 will be available in the Gemini app, Whisk, Vertex AI and across its Workspace apps like Slides, Vids, and Docs. Google stated that it also plans to launch a fast variant of Imagen 4 that will be up to 10x faster than Imagen 3.

The company stated that it has partnered closely with the various stakeholders of creative industries, including filmmakers, musicians, artists, and YouTube creators, to help shape these models and products responsibly.

Google also announced a new AI filmmaking tool called Flow that helps storytellers create cinematic shots and stitch together scenes into full length films and short stories.

Targeted at creative professionals, Flow brings together its AI models Veo, Imagen and Gemini to create cinematic clips, scenes and stories. Flow is built on the foundation of VideoFX, a Google Labs experiment that launched last year

It is available from today for Google AI Pro and Ultra plan subscribers in the United States, with rollout in more countries expected soon.

Gemini Deep Think mode

Google is also bringing new capabilities to its flagship Gemini AI models. It is testing an enhanced reasoning mode called Deep Think in Gemini 2.5 Pro, which uses new research techniques enabling the model to evaluate multiple hypotheses before responding.

Google released an early experimental version of Gemini 2.5 Pro in March 2025, featuring enhanced reasoning and advanced code capabilities. The model was also made available to all users through the Gemini app.

Earlier this month, the company rolled out an updated version, dubbed Gemini 2.5 Pro Preview (I/O edition), with significantly improved coding capabilities to help developers build richer, interactive web apps.

This includes fundamental coding tasks such as transforming and editing code, meaningful improvements for front-end and UI development, and creating sophisticated agentic workflows.

"We're taking a bit of extra time to conduct more frontier safety evaluations and get further input from safety experts. As part of that, we're going to make it (Deep Think) available to trusted testers via the Gemini API to get their feedback before making it widely available" Hassabis said.

In addition, Google is making Gemini 2.5 Flash, an AI model designed for fast, low-latency applications, available to all developers in early June with the Pro version expected to be available soon after.

Google had rolled out an early version of Gemini 2.5 Flash in preview for developers in April, available via Gemini API in Google AI Studio and Vertex AI.

Gemini 2.5 Flash is the company's first fully hybrid reasoning model that gives developers the ability to turn thinking or reasoning on or off. It also allows them to set “thinking budgets” to control how much the model reasons, helping them optimise across quality, cost, and latency. The model builds on Gemini 2.0 Flash, which was introduced in December 2024 and made available in February.

"(Gemini 2.5) Flash has been incredibly popular with developers who love its speed and low cost. The new 2.5 Flash is better in nearly every dimension, improving across key benchmarks for reasoning, code, and long context" Hassabis said.

Meanwhile, Google released an early experimental version of Gemini 2.5 Pro in March 2025, featuring enhanced reasoning and advanced code capabilities. The model was also made available to all users through the Gemini app.

Earlier this month, the company rolled out an updated version, dubbed Gemini 2.5 Pro Preview (I/O edition), with significantly improved coding capabilities. This includes fundamental coding tasks such as transforming and editing code, meaningful improvements for front-end and UI development, and creating sophisticated agentic workflows.

Google stated that it is also bringing computer use capabilities from Project Mariner, a research prototype exploring human-agent interactions, into the Gemini API and Vertex AI.

Companies like Automation Anywhere, UiPath, Browserbase, Autotab, The Interaction Company and Cartwheel are currently evaluating its capabilities and the company hopes to roll it out more broadly for developers to experiment later this year.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Vikas SN
Vikas SN covers Big Tech, streaming, social media and gaming industry
first published: May 20, 2025 11:17 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347