AI Summit: Language data gaps limit AI progress beyond dominant languages, says Alice Oh

Researcher flags structural and cost barriers in building inclusive multilingual models

February 18, 2026 / 20:16 IST
Story continues below Advertisement
AI Impact Summit
At the centre of the challenge lies the difficulty of obtaining sufficiently large datasets needed to train modern AI systems.
Snapshot AI
  • AI language models still focus on a few dominant languages.
  • AI inclusivity for many languages hindered by data scarcity, high costs.
  • Address language gaps in governments, companies, and universities.

Despite rapid advances in language models, progress remains limited to a narrow set of dominant languages, Alice Oh, professor, Korea Advanced Institute of Science and Technology, said. She pointed to structural data constraints — including languages without written forms and undigitised texts — as a persistent bottleneck for AI inclusivity.

“I think definitely as people pay more attention to these languages, the language models do get much better. It’s really great to see those improvements,” Alice Oh told Moneycontrol in an interview, while cautioning that such gains remain uneven. “It’s just that it’s still limited to a small number of languages.”

Story continues below Advertisement

Data availability

At the centre of the challenge lies the difficulty of obtaining sufficiently large datasets needed to train modern AI systems. Oh acknowledged that data scarcity is not an occasional obstacle but a defining limitation for many languages.