OpenAI, the creators of ChatGPT, will allow websites to block its web crawler, GPTBot, which is used to scrape online data for training artificial intelligence (AI) models.
The move comes as OpenAI had been slapped with lawsuits claiming the company was illegally mining data from online sources to train its bot.
According to reports, the bot is setup to automatically filter out paywalled content, sources that are not in line with OpenAI's policies and those that house personally identifiable information.
Online data is much sought after by technology giants, especially by Artificial Intelligence (AI) firms. Large Language Models (LLM) like GPT-4, which underpins ChatGPT, are trained on vast amounts of data collected from the web for accuracy.
While OpenAI maintains that it uses publicly available data to train its models, some lawsuits against the company disagree. There are also concerns on how licensed data like images, videos, music etc are sourced, as they have serious copyright implications.
Also read | OpenAI's GPT-3 performs as well as undergraduates in reasoning testsIf you don't want your website to be scraped by the bot, OpenAI says you can add GPTBot to your website's robots.txt file using the following parameter:
User-agent: GPTBot Disallow: /A robots.txt file is used by websites to allow or block access for specific or all crawlers.
In a blog post, OpenAI says that access to websites "can help AI models become more accurate and improve their general capabilities and safety" and the data will be used "to improve future models".
Users can also customise the bot and grant it access to specific portions of the website using the following parameters:
User-agent: GPTBotAllow: /directory-1/Disallow: /directory-2/
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.