Moneycontrol PRO
Black Friday Sale
Black Friday Sale
HomeTechnologyGoogle’s Gemini Live now lets you have human-like voice conversations with the AI assistant

Google’s Gemini Live now lets you have human-like voice conversations with the AI assistant

At Google I/O 2024, Gemini also rolled out an immersive trip planner, more extensions and the ability to create personalised AI agents.

May 14, 2024 / 23:35 IST
Gemini Live, will allow people to have an in-depth conversation with Gemini using their voice on the mobile app

Google wants Gemini to go beyond being just an artificial intelligence (AI) chatbot and become a truly personal digital assistant.

"We think a personal AI assistant should be able to solve complicated problems, take actions for you, and also feel natural when you converse with it. My Gemini should be really different from your Gemini" said Sissie Hsiao, Vice President and General Manager, Gemini Experiences and Google Assistant.

To achieve this, Google unveiled a new conversational experience on Gemini along with tools that allow the AI assistant to take action on the behalf of the consumer at its annual developer conference Google I/O 2024.

The conversational experience, called Gemini Live, will allow people to have an in-depth conversation with Gemini using their voice on the mobile app. One can even interrupt Gemini when it's talking so that the assistant can adapt to the user's speech patterns.

People can also choose from 10 different natural-sounding voices along with the tone and style of their preference.

This announcement comes a day after OpenAI introduced new enhancements to ChatGPT's voice mode as part of the launch of a new flagship AI model GPT-4o. The model can reason across audio, text, and vision in real-time.

These enhancements enable consumers to have real-time human-like voice interactions with the digital assistant. OpenAI's demos indicated a significant improvement from the first wave of voice assistants like Amazon Alexa, Apple Siri and Google Assistant.

Gemini Live's demos shown at Google I/O conference on May 14 indicate that Gemini now offers similar capabilities to consumers.

For instance, if a person is getting ready for a job interview or rehearsing for an important speech, they can use Gemini Live to help them prepare. In the demo shown by the company, Gemini suggested skills that the person can highlight while talking to their potential employer, and provided public speaking tips to calm their nerves before the person steps up to the podium.

Hsiao said that Gemini is using Google's latest speech models to understand what the user is saying and answer naturally with voice. The assistant is using a custom AI model that is tuned for conversation so that there is a lively back-and-forth when the person speaks to Gemini with their voice, she said.

"This feature (Gemini Live) is built on long standing work and investments we have in speech technology combined with the power of the Gemini AI model" Hsiao said.

To be sure, Gemini is an optional assistant for consumers at present. However, with Google positioning Gemini as the main brand for all its existing and future AI efforts, indicates that the app will eventually replace Google Assistant as the primary assistant on Android smartphones. The company already allows users to make Gemini the primary assistant on their smartphones.

Gemini Live will be available to subscribers of Gemini Advanced, the company's paid AI chatbot tier, in the coming months.

Later this year, people can also use the camera when they use the live feature, enabling them to have conversations about what they see around their surroundings. Some of these capabilities are powered by Project Astra, the company's ambitious initiative to build universal AI agents that can help people in everyday life.

ReadGoogle rebrands AI chatbot Bard as Gemini; launches paid tier and mobile apps

New extensions and solving complex tasks

Google is also soon adding new extensions that help Gemini pull in information from many of the Google apps, starting with YouTube Music. This extension will enable users to search for their favorite music even if they don't know the song title by mentioning a song verse or a featured artist.

In the coming months, a Google Calendar extension will enable users to take a picture of various flyers or school syllabus from their kids' school and add them to their digital calendar. Google will also introduce similar extensions for its task management service Tasks and note-taking service Keep in the future.

Paid Gemini subscribers will also get access to a new immersive trip planning experience to create custom dynamic itineraries in the coming months.

For instance, one can ask "My family and I are going to Thailand for a vacation. My son loves amusement parks and my husband really wants fresh seafood. Can you pull my flight and hotel info from Gmail and help me plan the weekend?”

Following this, Gemini grabs the person's flight information using the opt-in Gmail extension, uses Google Maps for restaurant and museum recommendations near the person's hotel, and uses Search to recommend other activities.

The assistant takes into account the user's flight timing, meal preferences and information about local amusement parks, while also understanding where each stop is located and how long it will take to travel between each activity, Hsiao said.

Gemini then explores all of the potential possibilities simultaneously and produces a personalised itinerary in seconds. If one makes changes or adds more details, the itinerary will update automatically.

ReadAds, cloud, subscriptions: How Google parent plans to monetise its AI offerings

Gemini Advanced subscribers and the company's business customers will also soon be able to create customised versions of Gemini, called Gems, for help in specific tasks. This could be a gym buddy, yoga coach, sous chef, coding partner or creative writing guide.

"Simply describe what you want your Gem to do and how you want it to respond — like “you're my running coach, give me a daily running plan and be positive, upbeat and motivating.” Gemini will take those instructions and, with one click, enhance them to create a Gem that meets your specific needs" Hsiao said. Regular Gemini users will also have access to a number of pre-made Gems like Learning Coach.

Yesterday, rival OpenAI made its custom chatbot store, GPT Store, available to all users for free.

Gemini 1.5 Pro and ability to upload documents

Google said that subscribers of Gemini Advanced will also get access to Gemini 1.5 Pro with a context window of a million tokens.

This will be available in more than 150 countries and over 35 languages starting today. The company plans to expand the context window to two million tokens later this year.

To help users make use of this long context window, Google is adding the ability for people to upload multiple Google Docs, PDFs, and Word files from Google Drive or their personal devices to Gemini for summaries, feedback, and insights about their documents. People will also be able to soon upload Google Sheets, CSVs, and Excel files.

Event alert: Moneycontrol and CNBC TV18 are hosting the ultimate event on artificial intelligence, bringing together entrepreneurs, ecosystem enablers, policymakers, industry leaders, and innovators on May 17 in Gurugram. Click here to register and gain access to the AI Alliance Delhi-NCR Chapter.
Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Vikas SN
Vikas SN covers Big Tech, streaming, social media and gaming industry
first published: May 14, 2024 11:35 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347