HomeNewsTechnologyIIT-Madras’ lab AI4Bharat launches IndicVoices dataset covering 22 languages
Trending Topics

IIT-Madras’ lab AI4Bharat launches IndicVoices dataset covering 22 languages

Using IndicVoices, AI4Bharat aims to build IndicASR, the first Automatic Speech Recognition (ASR) model to support all the 22 languages listed in the 8th schedule of the Constitution of India.

March 06, 2024 / 19:30 IST
Story continues below Advertisement
database
The mission of this dataset was to collect spontaneous speech of Indian languages

IIT Madras’ research lab AI4Bharat on March 6 launched IndicVoices, an open-source natural and speech dataset, covering 22 Indian languages.

The mission of this dataset was to collect spontaneous speech of Indian languages, said AI4Bharat said in a blog. IndicVoices is funded by Bhashini, which is backed by the Ministry of Electronics and Information Technology, Ekstep Foundation, and Nilekani Philanthropies.

Story continues below Advertisement

Using IndicVoices, AI4Bharat aims to build IndicASR, the first Automatic Speech Recognition (ASR) model to support all the 22 languages listed in the 8th schedule of the Constitution of India. ASR models, as the name implies, are used in systems that aim to transcribe spoken language into text, which can be used to carry out various functions.

The dataset contains a total of 7,348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16,237 speakers covering 145 Indian districts and 22 languages, AI4Bharat said in a release.