Microsoft's VALL-E AI can simulate human voice using three-second audio samples

The text-to-speech model can synthesize human voices by listening to a three-second audio sample

Moneycontrol News

January 11, 2023 / 15:04 IST

(Image Courtesy: Reuters)

Researchers from Microsoft have announced a new text-to-speech model that can simulate a person's voice by listening to a three-second audio sample.

VALL-E as it's called, can even preserve the speaker's emotional tone. The researchers call it a "neural codec language model" and it is built on the foundation of Meta's EnCodec compression model, which can compress audio into file sizes ten times smaller than MP3 at 64Kbps with no apparent loss to quality.

VALL-E uses EnCodec to break the audio information in a file into small chunks to analyze. Instead of using waveforms to study the data, VALL-E generates codec codes from text and acoustic prompts.

It tries to match the three-second sample to different conditions and environments, simulating how it thinks the voice would sound. To do this, the researchers trained VALL-E on more than 60,000 hours of audio from more than 7,000 speakers in Meta's LibriLight audio library.

Because of this, VALL-E can also simulate what the voice would sound like in different sound environments.

The researchers are also aware of the potential misuse of VALL-E's application. In the post announcing the project, the researchers say, "Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker."

To mitigate this, the researchers recommend, "a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model."

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Moneycontrol News

first published: Jan 11, 2023 03:04 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Email address

Microsoft's VALL-E AI can simulate human voice using three-second audio samples

The text-to-speech model can synthesize human voices by listening to a three-second audio sample

Related Stories

Subscribe to Tech Newsletters

Trending news