Moneycontrol
HomeNewsOpinionZuckerberg’s plan for AI hinges on your Facebook, Instagram data

Zuckerberg’s plan for AI hinges on your Facebook, Instagram data

If Zuckerberg wants to make a more powerful chatbot, the pile of data he’s sitting on is especially valuable because so much of it comes from comment threads. Any text that represents human dialogue is critical for training so-called conversational agents, which is why OpenAI heavily mined the internet forum Reddit Inc to build its own popular chatbot

February 06, 2024 / 16:37 IST
Story continues below Advertisement

With all that data, Zuckerberg’s quest looks doable. The problem is what the fallout could be for the rest of us.

For many people, Facebook is the internet, and the number of its users is still growing, according to Meta Platforms Inc’s latest financial results. But Mark Zuckerberg isn’t just celebrating that continuing growth. He wants to take advantage of it by using data from Facebook and Instagram to create powerful, general-purpose artificial intelligence. Sounds great and Meta is well positioned to do it, but his billions of users may end up paying the price with their privacy and more.

Here’s how Zuckerberg teased his next move in AI on Thursday:

Story continues below Advertisement

“The next key part of our playbook is learning from unique data and feedback loops in our products…  On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.”

The point that Zuck makes here about “Common Crawl” startled observers in the tech press, because that archive is already huge: 250 billion web pages spanning 17 years. It’s one of the biggest and most popular repositories of the public internet used for training AI systems today. When OpenAI launched its GPT-3 language model in 2020, close to 60 percent of the text used to train the system came from Common Crawl.