According to a recent investigative report by The Wired, “subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels” were used by Apple, Nvidia, Anthropic. The report claimed that Apple and others used this data to develop its open-source OpenELM models, which were released in April.
Now, Apple has issued a clarification and said that it didn’t use the OpenELM model to power any of its AI or machine learning features, including Apple Intelligence. Apple confirmed to multiple news outlets that OpenELM was created to contribute to the research community and advance open-source large language model development.
OpenELM, as per Apple, was designed only for research purpose. The model is available as open-source and can be accessed on Apple’s Machine Learning Research website.
In April, Apple had published a research paper, where it categorically said, “We do not use our users’ private personal data or user interactions when training our foundation models.”
Apple has made big claims about its Apple Intelligence models being trained on licensed data along with publicly available data collected by its web-crawler. “We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot,” Apple had said.
Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control, Apple added in a research paper published in April 2024.
The report by Wired suggested that the dataset was used to train AI models. The dataset is a part a larger collection called “The Pile,” created by the non-profit EleutherAI.
iOS 18: A step-by-step guide to download and install the new Apple public beta on your iPhone
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
