Moneycontrol PRO
Black Friday Sale
Black Friday Sale
HomeNewsTechnologyUnder fire, OpenAI allows websites to block bot that mines data

Under fire, OpenAI allows websites to block bot that mines data

OpenAI is facing lawsuits which claim that the ChatGPT maker has been illegally mining data from online sources to train its bot

August 09, 2023 / 13:56 IST
If you don't want your website to be scraped by the bot, OpenAI says that you can add GPTBot to your website's robots.txt file. (Representational Image)

OpenAI, the creators of ChatGPT, will allow websites to block its web crawler, GPTBot, which is used to scrape online data for training artificial intelligence (AI) models.

The move comes as OpenAI had been slapped with lawsuits claiming the company was illegally mining data from online sources to train its bot.

According to reports, the bot is setup to automatically filter out paywalled content, sources that are not in line with OpenAI's policies and those that house personally identifiable information.

Online data is much sought after by technology giants, especially by Artificial Intelligence (AI) firms. Large Language Models (LLM) like GPT-4, which underpins ChatGPT, are trained on vast amounts of data collected from the web for accuracy.

While OpenAI maintains that it uses publicly available data to train its models, some lawsuits against the company disagree. There are also concerns on how licensed data like images, videos, music etc are sourced, as they have serious copyright implications.

Also read | OpenAI's GPT-3 performs as well as undergraduates in reasoning tests

If you don't want your website to be scraped by the bot, OpenAI says you can add GPTBot to your website's robots.txt file using the following parameter:

User-agent: GPTBot Disallow: /

A robots.txt file is used by websites to allow or block access for specific or all crawlers.

In a blog post, OpenAI says that access to websites "can help AI models become more accurate and improve their general capabilities and safety" and the data will be used "to improve future models".

Also read | AI researchers have found a way to jailbreak Bard and ChatGPT

Users can also customise the bot and grant it access to specific portions of the website using the following parameters:

User-agent: GPTBotAllow: /directory-1/Disallow: /directory-2/
Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Moneycontrol News
first published: Aug 9, 2023 01:34 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347