Vikram Sampath on how to train AI to translate idioms like 'Bhains ke aage been bajana'

NAAV.AI cofounder and author of 10 books Vikram Sampath on training agentic AI for translating books into Indian regional languages, and what the speed and ease of translating with AI could mean for content across Indian languages.

Chanpreet Khurana

June 14, 2025 / 15:08 IST

Author Vikram Sampath (left) cofounded NAAV.AI with Sandeep Chauhan around January 2025, but they only announced its launch in June. (Images via Instagram/Vikram Sampath)

Author Vikram Sampath cofounded a pay-per-word AI translation start-up NAAV.AI with Sandeep Chauhan early this year. Part of what sparked the idea, Sampath says, was that it took him roughly 14 years to get his first book—'Splendours of Royal Mysore, The Untold Story of the Wodeyars' (2008)—translated from English into Kannada. This was surprising to him, given that the subject was Kannada history and he felt that the title would therefore be of interest to the Kannada-reading public. It was surprising also because he grew up in Bengaluru and had talked to Kannada writers and translators for the project who, he says, expressed an interest initially.

To be sure, the number of books that get translated into regional Indian languages is relatively small. Key deterrents include time, money, and the laboriousness of the process of translation. As with most things, there needs to be a justification—business or otherwise—for the expense, work and time it takes. In the case of Sampath's books—on Indian history—they've had to cross a few hurdles as well. In 2017, Sampath faced charges of plagiarism, which he challenged in court. And some reviewers felt that his two-volume biography of Veer Savarkar wasn't sufficiently critical of source materials.

Sampath, who has a PhD in musicology and history from the University of Queensland and is a Royal History Society fellow now, announced the launch of NAAV.AI in June 2025. (NAAV.AI has received funding from Ola's Bhavish Aggarwal, among others. Sampath's Foundation for Indian Historical and Cultural Research is also working with Krutrim—founded by Aggarwal—on this project and Ola Foundation on some other projects.)

Over a Zoom call, Sampath explains how NAAV.AI uses a human-plus-AI approach to improve translations in nine languages, including six Indian languages: Marathi, Kannada, Hindi, Tamil, Telugu and Malayalam. He also takes questions on writing history—who can do it, how, and who watches out for accuracy and objectivity in any translated versions. Edited excerpts:

Tell us about the genesis of NAAV.AI; the when, why, how of it. Also, you've said previously that you had some trouble getting a Kannada translator for your first book. Why is that?

(laughs) I didn't mean it that way. My first book came out in 2008, it was called 'Splendours of Royal Mysore, The Untold Story of the Wodeyars'... but it (the translation) didn't happen till 2022 when I met Dr S.L. Bhyrappa. It's not that in the Indian language market there is no hunger or demand for good content. I think it's just the lack of manpower and the laboriousness of the process of translations (which are the reasons why) a lot of content remains in the English-language bubble.

I was talking to my publisher now, Penguin, and they bring out 300-350 books every year. Forget the back lists which are there. I was told that of this, something like 50-60 books is all that makes the transition across all Indian languages, which is so sad in a linguistic haven like ours with 22 official languages, so many dialects and so many stories from every part of India. English, of course, is one medium, but why can't a Kannada story reach to someone in Bengal, and a Punjabi story to Gujarat, and so on. Even if we start talking among ourselves, I think that will be a huge advantage for Indian stories, for Indian literature and content to move intranationally.

With all of these thoughts - about how do we scale this (regional-language translation) content - I approached my friend Sandeep Singh (Chauhan), who is the cofounder of NAAV.AI. He comes from 20-plus years of a corporate life and technology. He said, 'Yeah, we could try something like this. AI is the thing that people are talking about, so let's see.'

Even the government is doing things like Bhashini and Anuvadini, and so on. But that is more for academic books. For the trade publishing market where we are talking of fiction, non-fiction, how do we do that? That was something that's consumed our attention for the last six to seven months - we've been working in stealth.

Sandeep and the team made that tool possible where you upload a PDF and two windows that come. The tool translates the text (in the left window) paragraph by paragraph, and the translation in the target language appears on the right-hand side. I put in a chapter from one of my books and we just sent the output of that, just that one chapter, to about 10 languages including German and French and Arabic and seven Indian languages.

The story (outcome) was different with different languages. Obviously the foreign languages did 85 percent accuracy. Hindi, among all the Indian languages, was better, at 70-75 percent. South Indian languages were OK, about 60-65 percent. Some, like Marathi, were very bad, almost 40-45 percent.

In terms of just the accuracy of the translation or also stylistically?

Everything... because the minute we say translation, there's Google Translate, and so many things are there, but it just doesn't work. There's a lot of customization that the technology team needs to do, to provide context to the tool. It should not hallucinate if you give large volumes of data, and we are talking here of 300-500 pages of content. It can just lose context, and out of that hallucination, it actually brings out garbage and garbled content also... A year ago, I would have balked at the idea that a machine will translate my book. There is that apprehension that all creative people rightly have. That's why I made it very clear, even in our communication everywhere and also in the actual process flow, that there has to be a human element in this entire process process flow pipeline itself.

We are currently working with six (Indian) languages: Hindi, Marathi, Kannada, Tamil, Telugu and Malayalam. And for each, we have hired one professional translator who actually does book translations regularly as a career. And they are going through and refining this output paragraph by paragraph, line by line. And whatever changes they are making, they give it as feedback. We capture the feedback and that feedback then goes back into the model.

From what I understand, it (training AI) is like teaching a child how to speak. Things like idioms, say, 'Bhains ke aage been bajana', it will translate in English in a very literal manner. So how do you (get to the proper translation,) "falling on deaf ears?" (Feedback to the model says) the next time you get this phrase, do this. This kind of thing that these people are doing, how does it help is, it largely reduces the turnaround time. What probably took them four months to do (earlier), you can probably do it in 15 days.

The human element is in it to enrich the process: you get a first draft to start with, and then embellishing it, correcting it, making those changes will be a lot easier. With the turnround time coming down, maybe productivity is increased, more volumes are produced and the demand-supply mismatch that we have between people who want to read and not able to get (translations in their language), including like how I mentioned, it took me so long to even get my first book on the Mysore Wodeyars translated in the language of the place I live in. And ironically, at a time when we're having language wars... We thought technology should be the enabler which brings people together and not something that divides, which is what is unfortunately happening.

Do regional language translations face more - or different - challenges than, say, regional language-to-English translations; for example, understanding cost, relevance, potential resonance as well as marketability of a title in Bengali vs Marathi vs Hindi vs Malayalam, etc.? In the context of NAAV, do these concerns affect your business model - because the costs and time go up every time you add a language? For example, just the cost of training large language models (LLMs) is a very capital-intensive and time-intensive process, right? How are you approaching it?

We are not even building LLMs; foundational LLMs run into billions of dollars of investment, which I think the government is trying to do with the IndiaAI mission and companies like Sarvam AI who have got the grant to do that as of now.

There's reliance on things like Anthropic and Chat GPT4 and Krutrim; we're using a mixed model... a lot of agentic framework, the agents which are employed within that to maintain context, clarity of thought for the model. This is such an evolving space; as we speak, someone may be doing something, and in one month, the model, suddenly, their accuracies can go up. Considering everyone is putting their heads together—many of the IITs are doing this (work)—about how do you get higher fidelity of language translation, per se. It's a golden opportunity in India, where we have so many languages. Very few use it (AI translators) in the literature space; I understand the kind of inhibitions that creative people, publishers too, might have. But inevitably this is going to be the future and it's better to adapt, upskill now, rather than be left out. Because it is going to come and it is going to disrupt at a very foundational level, not only the publishing industry, but I'm sure several industries, including the media. So, it's better to be part of the game quite early in the time cycle.

The translation market, I fully agree. We (NAAV.AI) now have a pipeline of about 30 books which we are doing with BluOne Ink... there's a matrix of books, even the authors, the subjects, with the languages. So, as you rightly said, not everything will work (across languages) ... some things are very region-specific. For instance, my Mysore Wodeyars book, it may not make much commercial sense for the publisher to do it in Assamese or Gujarati because finally they want to sell it also. Those things do limit the number of translations that can happen, but I think it is a process of building that appetite. When we don't even give that product to an end customer, how do we know whether there is no appetite?

Along with translations, what we are also looking at is audio books. We've started work on it and we'll be getting there using AI you can do text-to-speech, where you can have these books out with special effects added to it, it can be a very immersive experience for the end user or listener, which is something we can then explore: does a Gujarati audiobook work for people in Bengal or wherever else. Maybe that will give the publisher some insight (into) saying how many books we can do.

An Ernst & Young report of 2021 says every year 5-10 lakh books are published in India. And there's a huge market also of self-published authors. There's so much of desire to have your word out in print... And they are probably willing to take more risks to explore markets which, traditionally, the big guns may not.

Have you piloted any audio books yet?

We are in the process of doing that currently.

So how are you going about this - where do you get samples for reading voices?

We're trying to clone the voice of a human, maybe the author, if the author is very celebrated. If you get a good sample of that person's voice, you clone it. Then it'll seem like the author is narrating his or her own book in their voice. You could add background scores and it can become a voice synthesis. Then it's an audio engineering problem then to solve, in a music studio kind of a thing. Even my own books which are on Audible, I find—honestly—very boring sometimes because it is a very dull, drab (reading)... Though it's a human being who's narrating it. Sometimes they get the pronunciation wrong. I got a lot of feedback from people who heard the Savarkar audiobooks, saying the person was mispronouncing Marathi surnames, Marathi words and so on.

When it can happen with a human... here (at NAAV.AI) there is human intervention where someone has to listen to the whole audio, someone who knows that language, and identify which are the wrong areas and then correct those on a sound file. So I think it is a music, audio engineering and literature kind of a joint problem which this technology, we hope, will solve.

It's not that in the Indian language market there is no hunger or demand for good content. I think it's just the lack of manpower and the laboriousness of the process of translations (which are the reasons why) a lot of content remains in the English language bubble.

- Vikram Sampath

You mentioned that you have some 30 books in the pipeline. What is the split of fiction to non-fiction?

Majority are non-fiction. I think BluOne Ink doesn't have too many fiction (titles). I would say maybe 10 fiction and 20 non-fiction is what we have. We are more keen to see how fiction performs. I think nonfiction is still thought about as 'Ho Jayega' (it can be done), but I think fiction is the challenge to bring that tone, to bring that emotion. Thrillers and all kinds of other genres that are there. Nonfiction is perhaps among the things are slightly easier.

You preempted my next question: Do you think nonfiction is easier to translate?

They could be. What is happening in the West with the AI models is that with the amount of data they're giving to the model, even style is something that AI can understand. I can give it a command saying, I give you this paragraph, rewrite it in the manner that Shakespeare or Yeats would have written.

I think that can be done here, too. We were actually (trying that) with a Kannada publisher. Some of the greats of Kannada (literature)—Kuvempu and Da.Ra. Bendre—each of them has a very distinct style. It cannot be obviously completely extracted, but we are also toying and researching on that front to see, can you capture a person's style so that the output can be maybe 70 percent, it sounds like Kuvempu is writing this book. That would be quite fascinating if that happens, at the same time, a little worrisome that then the human being, the author, will probably go out of the loop. I may not even be necessary even to write the original book, leave alone the translation.

(Our focus right now is on) fixing this LLM to do a good first cut of translation, to about 80 percent (accuracy and quality) across languages: Can we look at creating data sets for the LLM, the Bhasha Kosh, which is there for each language. If you can get some of those out to also train (the AI tool in) the context and sometimes even simple things like spellings. For example, for a word like shrinkhala in Hindi, how will you write the sh? Different things: Chandrabindu, is there an adha 'na'?

The fact that Marathi (NAAV.AI's attempts at Marathi translation) was so bad was because there was no corpus. And currently what the Marathi expert told us was there, the sentence construction is very different in comparison to Hindi. There's a lot of preponderance of this pu-ling, stri-ling, quite like Sanskrit... It is so tough for a human being to understand, imagine having to teach and give a rule to that (for an AI tool). The linguists are trying to help us now to make these rules and try to feed it to the extent possible. It can't take care of all the possible cases and exceptions, but at least some of the major ones, if they can try and do that, that's what they're trying to do.

What is going to happen is that we are going to get better and faster outputs using both human enterprise as well as the power of AI to make things faster, better, more efficient, etcetera.

The human translators you've hired, are they going to read the original work as well as the AI work or only just the AI work? How does that work?

Both. We entered a UI interface where the PDF of the book is uploaded and depending on the size, it takes anything like 15 minutes to one hour to translate the whole thing. Then the the person who is doing it (the checking, and refining of the translated text) gets the login access. And just like there are two screens on this zoom window, on the left would be the original. You click on original paragraph, the corresponding paragraph in the target language appears, and then you make changes like you do in a Word document with track changes. When you strike off something, there's a detail which asks why are you making this change? Have I got it wrong in terms of spelling... there is a drop down of some 50-60 possible options that the translator will have to say that I'm doing it because of this. After all that we may not have captured the entire universe (of that language).

At some point, we are also trying that's what once those agentic frameworks come, then it could do small chunks of the book, not necessarily the entire book at one go. You do a small chunk and the feedback you give there will inform the model to do a better translation of the rest of the book. So it self-corrects. It is learning as it goes and you're giving it a satisfaction score saying, yeah, you're doing well. I literally feel it is an analogy of teaching a child how to speak a new language.

The current model is that you have a per-word charge for translation, right? What is the charge, and how did you fix on that pricing?

Yes. We looked at what the market rates currently are for each language and keeping it almost the same across each language that is there and for the publishing industry too. It will be slightly economical than what they would have done if you're going through a human process, but more than that, you're able to get the output back in such a short span. For instance, BluOne Ink gave us three children's book, which we started with, that came to us around March or April, and now we are in June, we already have these three books in six languages, which is 18 books which will be out for a launch in the first week of July. If they had to find a translator, get that person's interest, that person starts the work, it would have taken the whole of this year to bring out these 18 books. Whereas here in two-three months they have created a large repository of other languages. Which is why maybe even if it is almost the same or slightly higher than market rates, it is something that the publishing industry might think is worth it.

Tell us a bit about the subjects you choose for your books: What gets you excited about a Savarkar, about Tipu Sultan?

I owe my foray into literature to Tipu Sultan in a way, because I was 12 or 13 years old when 'The Sword of Tipu Sultan' came on television. And just to portray Haider and Tipu in a very good light, they had shown the Wodeyars, the Hindu Rajas of Mysore whose throne Haider actually usurped, in a very poor light. The Maharaja was shown as an obese buffoon. The Maharani is always a scheming woman. And these were people who were held in a lot of regard and respect by people in Karnataka even now after so many years of after independence. So there were a lot of protests in different parts of South Karnataka. Then the studio caught fire... (and there was some voicing of) superstitions that it's the curse of the Wodeyars. So as a young boy who was 12 or 13, who really otherwise didn't like history in school, I don't know what it turned inside me. My family had no connection to Mysore. I'm half a Maharashtrian, half a Tamilian living in Bangalore. It just became an obsession of sorts and a family project where my poor parents, my maternal grandmother, we all used to go to Mysore every vacation, never with the intention that it will become a book. It was more of an intellectual curiosity and to know the truth only about that one king and queen who were misrepresented in the serial. Then the fascination spread to the entire dynasty of about 600 years of Wodeyar rule.

I don't think I choose my topics. The topics come choosing me. Most of these topics are also either they (the subjects) have been forgotten or they are maligned individuals. All my books, when I look back at retrospectively 10 books that I've written, whether it was a Gauhar Jaan who was, who was India's first classical musician and woman to record herself on the gramophone and a superstar of her times, but totally forgotten even by Hindustani musicians today. Or (veena player) S. Balachander whose biography I wrote—he was a highly talented man, but completely maligned, misunderstood, forgotten because he stood up for the truth when it came to matters of sanctity in classical music. Savarkar also came in that sphere where, despite all his sacrifices and his philosophy being so much talked about in the political circles... there was an alarming lack of scholarship around Savarkar at that time.

There've been so many controversies, there've been so many allegations. There's that. But after my books came, there's been a plethora of books being written on Savarkar, which is what should be the case in a democracy. You may love someone, you may hate someone, but let your love or hatred be informed. Let it not be based on rhetoric, information, politics. It's not my intention to convert you to become someone's fan or hater... A historian's job is to illuminate the archive and to bring documents into the public realm. Common reader may not have the interest or the craft for how to ferret information out of a archive, how to critique sources, how to make an analysis.

There's some criticism around your take on history, as also around who can write history books and which history books are more solid in terms of their foundation in archival material, in research material, in research methodologies and objectivity, etc. Now, you come from a corporate background...

I did a PhD in history from the University of Queensland.

How do you see this criticism?

Someone had said very famously that history is too important to be left only to the historians. It is everybody's bequest. It's an inheritance that all of us have. Someone could actually turn around and say even someone like a William Dalrymple doesn't have a degree in history or a Ramchandra Guha has not done a PhD in history. So is it only someone who's limited to a JNU and a league... places like this, only they have the holy right. I realized that within the first two or three books that I wrote, I already knew that I needed an academic training. So I went through a five-year rigorous history PhD in Australia and 'Savarkar' and all these other books came after that. And I think it's only as a recognition that even the Royal Historical Society, which is the world's largest and most prestigious group of historians, decided to elect me as a fellow, which I continue to be... This dog in the manger attitude towards history writing that some people have, it's like "we are the gatekeepers and we will not let anyone who has a different viewpoint ideologically and interpretatively." Also, I think history is a is a constant dialogue between a historian and his or her sources and his or her interpretation of that.

So (there will be) some interpretation and analysis; everything will not be available in sources. Some of it is inference where the subjectivity comes. If everything about the past was already very clearly written somewhere, then there's no need to do any further research. Your interpretation may vary from mine and that's why history is a humanity subject and not a science where you can conclusively and decisively prove that 1 + 1 has to be two. Here there can be an element of human subjectivity which comes through interpretation. So it should. The discipline thrives when there's a multiplicity of voices.

The business model of NAAV.AI is pay-per-word. But there's no vigilance around what or gets translated, correct? On the one hand, this could democratize who gets translated. But, on the other hand, it also means that experts may not be able to necessarily keep an eye and make sure that the correct histories uncovered using the correct tools and training are getting dispersed. How do you solve for this? Is that something that gives you sleepless nights?

We're not doing only history books. It could be anything. In those 30 books that BluOne Ink has given, some are non-fiction which is very general.

We also live in a world where misinformation sort of spreads really fast. Is that something you're sort of worried about?

That is true. That's why, at all levels (of NAAV.AI), there is human scrutiny. It's not that we just pass something through that (the tool) and next day the publisher gets it ready to print. Even these three (children's) books that we did in six languages, it took a month to month-and-a-half for them to go through this review that then there were two stages of proofreading. All of that was already done. But yeah, you're right: in today's age of social media, of information explosion, I think democratizing of content and also voices, everyone has a megaphone on their mouth, expressing opinions directly to the president and prime ministers of countries and expecting them to reply also. So, I think the gatekeeping has gone out in every space, whether it is the media where someone was telling you how to think, what to save, or the academia, which is telling you what is the right thing to study. And that is true democratization where multiplicity of voices should thrive.

Chanpreet Khurana Features and weekend editor, Moneycontrol

first published: Jun 14, 2025 10:19 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Vikram Sampath on how to train AI to translate idioms like 'Bhains ke aage been bajana'

NAAV.AI cofounder and author of 10 books Vikram Sampath on training agentic AI for translating books into Indian regional languages, and what the speed and ease of translating with AI could mean for content across Indian languages.

Related stories

Subscribe to Tech Newsletters

Trending news