HomeNewsOpinionOpenAI tapping YouTube? Big Tech's trapped in a glass house of its own making

OpenAI tapping YouTube? Big Tech's trapped in a glass house of its own making

Having exploited user data for years, the tables are turning as Big Tech firms grab it from each other

April 09, 2024 / 11:45 IST
Story continues below Advertisement
Chatgpt
Has Google tried grabbing some of Meta’s data in the same way OpenAI scraped YouTube?

A few weeks ago, the chief technology officer of OpenAI was asked if her company had used YouTube videos to train its AI systems. First, she gave a blank stare. Then there was a grimace. Finally, Mira Murati gave an answer that avoided the messy and furtive world she and other tech companies were operating in: “Actually, I’m not sure about that.”

According to a New York Times report, OpenAI in fact had trained its AI on “more than one million hours of YouTube videos,” using a speech recognition tool called Whisper. All the conversational text from the transcriptions was used to train GPT-4, the flagship large language model that underpins ChatGPT.

Story continues below Advertisement

Large tech players racing to build more capable AI models have reached a point where they have fewer and fewer places to look for data on the public web, and taking text from the transcripts of YouTube videos suggests OpenAI has been digging between the proverbial couch cushions, even at the risk of breaking someone’s rules. There’s a decent chance it did. YouTube Chief Executive Officer Neal Mohan told Bloomberg News last week that if OpenAI had used YouTube videos to refine its AI, that would be a “clear violation” of YouTube’s terms of use. OpenAI didn’t respond to a request for comment.

Still, it’s hard to see the tension ratcheting up between OpenAI and Google over this. Google, for one, can hardly complain about a data violation when its entire business has been built on collecting the private data of billions of consumers, often at a startling and surprising scale. Google has also scraped transcription data from some YouTube videos to train its AI models, Mohan told Bloomberg.