HomeNewsBusinessGoogle, IISc help make local apps in India more inclusive

Google, IISc help make local apps in India more inclusive

Google said that IISc is open-sourcing the first set of speech data comprising over 4,000 hours across 38 languages to developers, with more data sets expected to be added in the future.

June 28, 2023 / 12:00 IST
Story continues below Advertisement
Google
Google has previously announced its ambition to build a single, unified AI model that can handle over 100 Indian languages across speech and text.

India is a challenging market for tech companies and startups looking to make the web more accessible to Indians. Dialects change every few hundred kilometres, and people are more comfortable conversing than writing. This has led to a dearth of natural language data corpus to build large language AI models that understand all the language and dialect nuances.

Google, however, believes it may be a step closer to solving this through its collaboration with the Indian Institute of Science (IISc) and ARTPARK (Artificial Intelligence & Robotics Technology Park) on an initiative called Project Vaani, which was launched in December last year.

Story continues below Advertisement

On June 28, the internet giant announced that IISc is open-sourcing the first set of speech data comprising over 4,000 hours across 38 languages to developers, with more data sets expected to be added in the future. The announcement was made at the company’s developer event held in Bengaluru.

The initiative is aimed at collecting and transcribing open-source anonymised speech data from across all of India's 773 districts, while ensuring linguistic, educational, urban-rural, age, and gender diversity in three different phases, with the first phase focusing on 80 districts across 10 states.