Moneycontrol PRO
Black Friday Sale
Black Friday Sale
HomeNewsTrendsWhat are 'Jailbreak' prompts, used to bypass restrictions in AI models like ChatGPT?

What are 'Jailbreak' prompts, used to bypass restrictions in AI models like ChatGPT?

Visitors to the Jailbreak Chat site can add their jailbreaks, try ones that others have submitted, and vote prompts up or down based on how well they work.

April 12, 2023 / 12:47 IST
ChatGPT, since its launch, has taken the world by storm.

A growing number of people are finding ways to bypass the restrictions built into artificial intelligence programs to stop them from being used in harmful ways, abetting crimes or espousing hate speech. The techniques to poke and prod these popular AI tools are used to expose potential security flaws and highlight the capacity and limitations of AI models. These tools include artificial intelligence chatbots like ChatGPT, Microsoft Corp. ’s Bing and Bard, recently released by Alphabet Inc.’s Google.

One of the creators who is at the forefront of these bypass techniques is Alex Albert, a 22-year-old computer science student at the University of Washington, who has become a prolific creator of intricately phrased AI prompts known as “jailbreaks.” Albert’s jailbreak prompts have the ability to push powerful chatbots like ChatGPT to sidestep the human-built bannisters that keep a check on what the bots can and can’t say.

Albert created the website Jailbreak Chat early this year, where he confines prompts for artificial intelligence chatbots like ChatGPT that he’s seen on Reddit and other online forums and posts prompts he’s come up with, too. Visitors to the site can add their jailbreaks, try ones that others have submitted, and vote prompts up or down based on how well they work. Albert also started sending out a newsletter, The Prompt Report, in February, which he said has several thousand followers so far.

Jenna Burrell, director of research at nonprofit tech research group Data & Society, sees Albert and others like him as the latest entrants in a long Silicon Valley tradition of breaking new tech tools. This history stretches back at least as far as the 1950s, to the early days of phone phreaking, or hacking phone systems.

While these techniques can be dangerous and abusive, Albert’s work has enabled the community to learn how the tools work, which allows them to explore and experiment with AI models. Albert and other tech enthusiasts aim to highlight the potential for AI and how the community can influence its development.

However, crafting these prompts presents an ever-evolving challenge. A jailbreak prompt that works on one system may not work on another, and companies are constantly updating their tech. For instance, the evil-confidant prompt appears to work only occasionally with GPT-4, OpenAI’s newly released model. The company said GPT-4 has stronger restrictions in place about what it won’t answer compared to previous iterations.

OpenAI encourages people to push the limits of its AI models, and that the research lab learns from the ways its technology is used. However, if a user continuously prods ChatGPT or other OpenAI models with prompts that violate its policies, such as generating hateful or illegal content or malware, it will warn or suspend the person, and may go as far as banning them.

Read: This Indian firm is reimbursing employees for ChatGPT Plus after noticing rise in productivity

While it’s still early days for this AI technique, the community sees it as an opportunity to contribute to the development of AI while exploring its capabilities. However, as with any new technology, there are concerns over its use, which needs to be regulated to prevent any misuse.

first published: Apr 12, 2023 12:43 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347