Moneycontrol PRO
HomeTechnologyInside the tech behind Modi-Lex podcast: ElevenLabs India head explains the breakneck translation speed

Inside the tech behind Modi-Lex podcast: ElevenLabs India head explains the breakneck translation speed

ElevenLabs has a global team of around 150 people, including 10 in India, and plans to expand its go-to-market team further.

March 18, 2025 / 13:23 IST
Prime Minister Narendra Modi on a podcast with Lex Fridman.

The Modi-Fridman podcast made waves globally—not just for the high-profile conversation between Prime Minister Narendra Modi and AI researcher and podcaster Lex Fridman, but for the way the interview was AI-dubbed seamlessly into multiple languages, making it accessible to a much wider audience.

What really stood out was how quickly this was achieved, something that usually takes weeks being done in just a few hours.

The technology that made this possible was built by London-headquartered unicorn ElevenLabs, an AI voice synthesis company.

"The beauty of our model is that it is natively multilingual and understands context. It has emotion built into it," explained Siddharth Srinivasan, go-to-market leader for India at ElevenLabs. This means the AI doesn’t just translate, it feels the conversation, adds pauses and even subtle changes in tone, Srinivasan added.

ElevenLabs is backed by Sequoia Capital venture capitalist Andreessen Horowitz along with entrepreneurs Nat Friedman and Daniel Gross, among others.

ElevenLabs’ customer base includes both individual content creators and large enterprises like The Washington Post. They’ve also partnered with Perplexity.AI for a short-form daily podcast on innovation and with TIME magazine to add automated voiceovers to TIME.com.

The Modi-Fridman episode is available in English, Hindi and Russian, and will soon follow in other languages. But for now, Indian AI startup Sarvam AI’s co-founder Pratyush Kumar has released snippets of the podcast in nine Indian languages on X (formerly Twitter) and offered to share the full version with ElevenLabs.

Also read: AI is powerful but may never be able to match depth of human imagination: PM Modi

How ElevenLabs pulled it off

Speed was a major factor in this project. Traditionally, dubbing a conversation of this magnitude, with such high-profile figures, would typically take weeks. So how did ElevenLabs deliver?

The secret lies in their proprietary audio models. These aren’t just any speech synthesis models, they’re highly trained multilingual AI models that understand different languages, accents and contexts.

“The way the models are built, either through the voices we provide or the voices that you put into the platform or even the voices you synthesise, you literally have infinite voice possibilities. Because of the core part of the technology, what you're able to do is have these experiences, so that it doesn't seem like it's a translator but it comes across authentically in that person's voice,” Srinivasan said.

Nonetheless, all is not left to AI, and human oversight is needed.

“We run this with a human-in-the-loop process where technology enables things to happen... and then you have a really meticulous, strong editorial process,” he added. This combination of AI speed and human accuracy made sure that the dubbed version sounded authentic.

ElevenLabs employs around 150 people globally, Srinivasan said, which is spread out in multiple countries. In India, it has about 10 people working for it and the company is going deeper into the go-to-market team. “We should be aggressively expanding this year to build out the India market… India is definitely a must-win market for us, we are investing locally. We're ensuring that India is part of a lot of the core product programs and research work,” Srinivasan said.

Behind the tech

When asked about the technical foundation behind ElevenLabs’ voice models, the India head clarified that their approach is quite different. “Our models are actually audio models that are our own. So that’s our IP,” he said.

Rather than relying just on text-based models like LLMs or large language models, ElevenLabs has built specialised audio models designed specifically for speech synthesis and voice cloning.

Srinivasan also highlighted three core models: Multilingual V2, which handles media workflows with high accuracy; Flash 2.5 for conversational use cases with fast processing; and Scribe, their speech-to-text model that delivers extremely precise transcriptions.

“We're not the cheapest product, in fact typically we find ourselves at the higher end of the price spectrum, and that's because we do believe we're delivering a very high-quality product doing a lot of things that did not happen before,” Srinivasan said.

Also read: Chinese media's praise for PM Modi's remarks during Lex Friedman podcast

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Reshab Shaw Covers IT and AI
first published: Mar 18, 2025 01:19 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347