Moneycontrol PRO
HomeArtificial IntelligenceIIT Bombay's BharatGen bets on trillion-parameter ‘mother model’ to power India’s AI ecosystem

IIT Bombay's BharatGen bets on trillion-parameter ‘mother model’ to power India’s AI ecosystem

The consortium says multimodal data collection is underway as it readies for trillion-parameter training.

September 19, 2025 / 14:39 IST
On September 18, the IT ministry announced 8 more startups and companies who are going to work on developing India's foundational AI models

BharatGen, which has been awarded more than Rs 900 crore under the IndiaAI Mission to build a 1-trillion parameter large language model (LLM), said the goal is to distill it into smaller, domain-specific systems for sectors like law, agriculture, and finance.

“The reason to get to one trillion is to be able to do model distillation… that capability is in the national interest,” said Rishi Bal, executive vice president at BharatGen and IIT Bombay's Ganesh Ramakrishnan in an interaction with Moneycontrol.

BharatGen is a government-funded consortium led by IIT Bombay, together top academic institutions including IIT Bombay, IIT Madras, IIIT Hyderabad, IIT Kanpur, IIT Hyderabad, IIT Mandi, and IIM Indore, to build India’s first multimodal LLM across 22 Indian languages.

Why a trillion parameters?

According to Bal, the trillion-parameter model is not meant for direct consumer use but to serve as a foundation. “A lawyer in a district court or a farmer using a mobile app doesn’t need a trillion-parameter system. They need something leaner, faster, trained on the right data. That’s what model distillation allows," Bal said.

“Once you’ve trained the large model, you can derive smaller, more efficient ones that are easier to use in specialised areas,” he said.

This means the model will act as a “mother system”, powering lighter applications, from agricultural advisory tools in regional languages to legal assistants trained on Indian case law.

How are they collecting data?

Bal said BharatGen is investing heavily in assembling a sovereign dataset, combining multiple streams of Indian content.

  • Publisher tie-ups: “We are working with publishers to license their archives and create digital corpora,” he said.
  • Free OCR services: “We’re offering OCR tools to digitise regional texts that are currently locked in print form.”
  • Crowdsourced annotation: “We are bringing in distributed annotators to capture nuance in Indian languages and culture.”

“These efforts are meant to ensure that the model reflects Indian contexts, rather than depending on foreign data,” Bal explained.

What about hardware and GPUs?

Training a trillion-parameter model requires thousands of GPUs working in parallel. Bal acknowledged that hardware availability is a bottleneck.

“We know how to use the GPUs, how to scale across them, and how to get training runs right,” he said. “The challenge is that we, like everyone else, have to wait for GPU supply."

The Rs 900-crore funding that the government announced on September 18, that BharatGen will receive, will be in the form of subsidy for availing GPUs. Under the mission, the government has made available nearly 40,000 GPUs for various activities, including for building India's sovereign LLM models.

What does the leadership say about the vision?

Speaking to Moneycontrol, Ramakrishnan said BharatGen’s focus is on reliability and real-world use rather than raw scale.

“Our focus is on creating models rooted in Indian data and languages, which can make them more reliable for real-world applications,” he said.

He added that BharatGen will release distilled models to the developer ecosystem, enabling startups and enterprises to build applications on top of them without having to train massive systems independently.

Where does BharatGen operate from?

The company runs a hub-and-spoke model, with teams spread across multiple locations in India. “We’ve structured ourselves to bring together engineers, data scientists and domain experts while keeping operations lean,” Bal explained. This distributed approach, he said, allows BharatGen to tap into diverse regional expertise.

How is the project funded?

The Rs 900 crore grant from the IndiaAI Mission forms the financial backbone of the initiative. Ramakrishnan said BharatGen is also working on public–private partnerships and exploring revenue models such as licensing smaller distilled models.

“Revenue models will evolve,” he said. “But what’s clear is that this is national infrastructure. The value will come not just from what BharatGen does, but from what the broader ecosystem is able to build on top of it.”

How does this compare to global models?

Global AI companies such as OpenAI, Google and Anthropic have also built trillion-parameter systems. Ramakrishnan argued that India’s effort isn’t about matching them parameter for parameter.

“This is about relevance. A model trained on Indian data, with Indian linguistic and cultural grounding, will behave differently, and more usefully, than one trained elsewhere,” he said.

What comes next?

BharatGen’s immediate priorities are to refine its dataset, prepare hardware infrastructure, train the foundation model, and then spin off distilled versions.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Aihik Sur covers tech policy, drones, space tech among other beats at Moneycontrol
first published: Sep 19, 2025 02:01 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347