Moneycontrol PRO
Black Friday Sale
Black Friday Sale
HomeArtificial IntelligenceBhashini seeks partners to annotate Indian language data for AI model training

Bhashini seeks partners to annotate Indian language data for AI model training

Government-backed language mission to empanel agencies for large-scale, high-quality data annotation across 22+ Indian languages to strengthen AI translation and speech systems.

August 13, 2025 / 10:10 IST
Bhashini is an AI-powered platform developed by the Ministry of Electronics and Information Technology (MeitY) as part of the National Language Translation Mission under Digital India.

The Digital India Bhashini Division (DIBD) under the Ministry of Electronics and Information Technology has called on agencies to annotate and label datasets in 22 Indian languages necessary for training artificial intelligence (AI) models.

"Data annotation and labelling are crucial for machine learning because they provide the necessary context for algorithms to learn effectively," DIBD CEO Amitabh Nag told Moneycontrol.

"By adding meaningful tags or labels to raw data (like images, text, or audio), these processes enable models to understand patterns, make accurate predictions and ultimately, perform desired tasks," Nag added.

Nag said that without properly labelled data, "machine learning models struggle to learn, leading to poor performance and unreliable results"

In this regard, DIBD has floated a request for empanelment (RFE), inviting companies to annotate and label Indian datasets. "The RFE is providing a huge opportunity to those in the data industry to be part of the AI revolution,” Nag added.

According to the RFE, vendors will be expected to cover five core AI/ML language tasks, which includes Automatic Speech Recognition (ASR), Machine Translation (MT), Text-to-Speech (TTS), Optical Character Recognition (OCR), and Transliteration.

The RFE said that vendors will be expected to annotate raw data with domain- and task-specific metadata. For ASR, for instance, selected vendors will have to produce both verbatim and cleaned transcripts, timestamping and speaker details such as age and gender.

For Machine Translation, translations will need to be validated for context, fluency, and alignment.

All annotation work must be performed on Bhashini’s in-house Data Capture and Curation Framework (DCCF) platform, the RFE said.

The RFE sets strict quality benchmarks to validate consistency. Industry experts emphasise how crucial high-quality labeled data is for effective AI systems, particularly in low-resource languages.

Bhashini currently supports over 22 official Indian languages, the same languages listed under the Eighth Schedule of the Indian Constitution.

The empanelment will be valid for one year and can be extended to two years, and the government body is inviting bids until August 28.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Aihik Sur covers tech policy, drones, space tech among other beats at Moneycontrol
first published: Aug 13, 2025 10:10 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347