Meta has developed a new way to train Automatic Speech Recognition (ASR) models by clustering speech at an "utterance level".
ASR models, as the name implies, are used in systems that aim to transcribe spoken language into text, that can be used to carry out various functions. The most popular examples of ASR's are smartphone assistants like Apple's Siri, Amazon Echo or Google Assistant.
Also read | WhatsApp testing logins using phone numbers on WhatsApp Web
Despite the advancement of AI technology, you may find these assistants will sometimes have a hard time understanding you speech. Meta aims to improve this clustering various speakers from different ethnicities together, rather than traditional data sets that train ASR models based on metrics such as age group or gender.
The goal is to meld similar utterances from a diverse group of speakers together in one data set, and then use that to train the ASR model.
Also read | Analysis: Meta's 'friendly' Threads collides with unfriendly internet
Meta says with this they can train the model, "using the various clusters and use fairness datasets to measure how the model impacts outcomes across different demographic groups. The clustering is performed using unsupervised learning, leveraging algorithms to analyze and group unlabeled data sets without human intervention".
The company said they observed an increase in accuracy in models trained by this method across various demographic groups and different accents.
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
