Moneycontrol PRO
HomeNewsOpinionIndic Large Language Models will be transformative

Indic Large Language Models will be transformative

There is a huge population out there waiting for LLM-powered applications to bring about a change in their life by bringing the right content to them in the right way at the right time. If this goal is to be realized, using Indic languages is a prerequisite 

May 22, 2025 / 15:37 IST
There is a huge population out there waiting for the LLM-powered applications to bring about a change in their life.

By Lipika Dey 

Generative AI, especially generative language technologies, are making huge strides in recent years, since they have enabled a very large set of users, who couldn’t use any technology earlier due to interfacing barriers, now make use of it for a whole array of tasks.

Imagine a farmer querying on “how to ensure a good yield of rice crop notwithstanding the vagaries of rainfall” and also getting back answers on the right kind of seeds to use along with other contextually relevant information about micro-irrigation, soil treatment and sustainable, organic farming.

Such an application would undoubtedly be a boon for Indian farmers, whose numbers are estimated at around 177 million in 2024. The catch however is that the benefits will only start showing if the applications support question and answering in Indian languages.

Add to it the power of adding visuals directly from the field showing the state of crops or soil along with the question, and we have “multimodal” interactions at play It is easy to imagine how such applications can change the face of agriculture, education or healthcare in this country.

Early movers in building LLMs in Indic languages

Sarvam 1, developed by Sarvam AI, is one of the earliest home grown multilingual large language models handling Indic languages. It is a 2 billion-parameter model, trained on 4 trillion tokens, which were curated by Sarvam. It uses an innovative tokenizer that’s faster than those used for English language models. Along with English, Sarvam 1 supports 11 Indian languages: Bengali, Gujarati, Hindi, Marathi, Malayalam, Kannada, Oriya, Tamil, Telegu, Punjabi and English.

Using relatively smaller volumes of data to begin with, Sarvam focuses on innovative ways of generating synthetic data to train their models, and does it quite successfully. While Sarvam has built an array of generative AI agents powered by its own speech and language models, the model can also be used by application developers. Users can also download the base models from Hugging Face library and train their domain-specific custom language models, to power innovative generative AI applications for Indic language speakers.

In April 2025, Sarvam has been selected by the Government of India, under the IndiaAI mission, to build India’s sovereign Large Language Model. This gives them access to dedicated computing resources to build a foundational indigenous model from scratch. Designed as a scalable, voice-based model, it will not only be fluent in Indian languages but also be capable of causal reasoning. The model also promises to be secure.

BharatGPT, another Indic Large Language Model (LLM), is developed by CoRover.ai along with support from several IITs, Google, the Department of Science and Technology, SML, and Reliance Jio. Bharat GPT is a multilingual and multimodal language model, which can be adapted for all kinds of fields like healthcare, banking, tourism, education, and even government services.

CoRover.ai, which boasts of building the world’s first human-centric conversational AI platform that is being used by 130 crore users, aims to repeat its feat for Indic languages. It supports text modality for 22 Indian languages and voice modality for more than 14 of them. Application builders can also add their custom knowledge bases to support their applications. The model also has inbuilt payment gateway to facilitate real-time transactions, dialogue and conversation management tools and various other features that help application builders build custom bots efficiently. BharatGPT is hosted in Google CloudPlatform, assures data sovereignty, privacy, and security, along with facility to utilize Google’s AI services.

Opening the door to relevant datasets

A prerequisite for large language models is the availability of large Indic language datasets. AI4Bharat, a research lab at IIT Madras, has released a wide range of datasets and open-source tools to help developers, thereby making a significant impact on researchers and development professionals.  While the landscape is still shaping up, there are differences of opinion on whether the focus should be on building high quality Indic language models, or on innovative foundational models. Whichever be the case, there is a huge population out there waiting for the LLM-powered applications to bring about a change in their life by bringing the right content to them in the right way at the right time.

(Lipika Dey is Professor of Computer Science, Ashoka University.)

Views are personal and do not reflect the stand of this publication. 

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Moneycontrol Opinion
first published: May 22, 2025 03:36 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347