BharatGen, which has been awarded more than Rs 900 crore under the IndiaAI Mission to build a 1-trillion parameter large language model (LLM), said the goal is to distill it into smaller, domain-specific systems for sectors like law, agriculture, and finance.
“The reason to get to one trillion is to be able to do model distillation… that capability is in the national interest,” said Rishi Bal, executive vice president at BharatGen and IIT Bombay's Ganesh Ramakrishnan in an interaction with Moneycontrol.
BharatGen is a government-funded consortium led by IIT Bombay, together top academic institutions including IIT Bombay, IIT Madras, IIIT Hyderabad, IIT Kanpur, IIT Hyderabad, IIT Mandi, and IIM Indore, to build India’s first multimodal LLM across 22 Indian languages.
Why a trillion parameters?
According to Bal, the trillion-parameter model is not meant for direct consumer use but to serve as a foundation. “A lawyer in a district court or a farmer using a mobile app doesn’t need a trillion-parameter system. They need something leaner, faster, trained on the right data. That’s what model distillation allows," Bal said.
“Once you’ve trained the large model, you can derive smaller, more efficient ones that are easier to use in specialised areas,” he said.
This means the model will act as a “mother system”, powering lighter applications, from agricultural advisory tools in regional languages to legal assistants trained on Indian case law.
How are they collecting data?
Bal said BharatGen is investing heavily in assembling a sovereign dataset, combining multiple streams of Indian content.
- Publisher tie-ups: “We are working with publishers to license their archives and create digital corpora,” he said.
- Free OCR services: “We’re offering OCR tools to digitise regional texts that are currently locked in print form.”
- Crowdsourced annotation: “We are bringing in distributed annotators to capture nuance in Indian languages and culture.”
“These efforts are meant to ensure that the model reflects Indian contexts, rather than depending on foreign data,” Bal explained.
What about hardware and GPUs?
Training a trillion-parameter model requires thousands of GPUs working in parallel. Bal acknowledged that hardware availability is a bottleneck.
“We know how to use the GPUs, how to scale across them, and how to get training runs right,” he said. “The challenge is that we, like everyone else, have to wait for GPU supply."
The Rs 900-crore funding that the government announced on September 18, that BharatGen will receive, will be in the form of subsidy for availing GPUs. Under the mission, the government has made available nearly 40,000 GPUs for various activities, including for building India's sovereign LLM models.
What does the leadership say about the vision?
Speaking to Moneycontrol, Ramakrishnan said BharatGen’s focus is on reliability and real-world use rather than raw scale.
“Our focus is on creating models rooted in Indian data and languages, which can make them more reliable for real-world applications,” he said.
He added that BharatGen will release distilled models to the developer ecosystem, enabling startups and enterprises to build applications on top of them without having to train massive systems independently.
Where does BharatGen operate from?
The company runs a hub-and-spoke model, with teams spread across multiple locations in India. “We’ve structured ourselves to bring together engineers, data scientists and domain experts while keeping operations lean,” Bal explained. This distributed approach, he said, allows BharatGen to tap into diverse regional expertise.
How is the project funded?
The Rs 900 crore grant from the IndiaAI Mission forms the financial backbone of the initiative. Ramakrishnan said BharatGen is also working on public–private partnerships and exploring revenue models such as licensing smaller distilled models.
“Revenue models will evolve,” he said. “But what’s clear is that this is national infrastructure. The value will come not just from what BharatGen does, but from what the broader ecosystem is able to build on top of it.”
How does this compare to global models?
Global AI companies such as OpenAI, Google and Anthropic have also built trillion-parameter systems. Ramakrishnan argued that India’s effort isn’t about matching them parameter for parameter.
“This is about relevance. A model trained on Indian data, with Indian linguistic and cultural grounding, will behave differently, and more usefully, than one trained elsewhere,” he said.
What comes next?
BharatGen’s immediate priorities are to refine its dataset, prepare hardware infrastructure, train the foundation model, and then spin off distilled versions.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!