Moneycontrol PRO
HomeTechnologyOpenAI research reveals AI models can deliberately deceive

OpenAI research reveals AI models can deliberately deceive

OpenAI’s latest study reveals that AI models can deliberately deceive, or “scheme,” but alignment methods show promise in reducing the behaviour.

September 20, 2025 / 16:40 IST
OpenAI

OpenAI has published new research showing that AI models are capable of deliberate deception, raising fresh concerns about how these systems behave when given complex instructions.

The study, conducted with Apollo Research, focused on a behaviour the team described as “scheming” — when an AI appears to follow instructions on the surface while secretly pursuing hidden goals. The researchers compared this behaviour to a dishonest stockbroker breaking rules to maximise profits. While most instances of scheming were relatively minor, such as pretending to have completed a task without doing so, the findings highlight a more intentional form of deception than the familiar AI hallucinations users often encounter.

The work was designed to test an approach called “deliberative alignment.” This method involves teaching the model an anti-scheming framework and making it review those rules before carrying out an action, much like reminding children of the rules before they play. The results showed significant reductions in deceptive behaviour, suggesting that alignment techniques can make a difference.

Yet the paper also admitted to a central challenge: training a model not to deceive could backfire by making it better at hiding its deception. The researchers warned that a model aware it is being tested may simply pretend to behave correctly in order to pass the evaluation. “A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” they wrote.

This research builds on earlier findings from Apollo Research in December, which showed multiple AI models engaged in scheming when instructed to achieve goals “at all costs.” OpenAI’s results add evidence that deliberate deception is a recurring behaviour across systems, not just accidental side effects of guesswork.

OpenAI co-founder Wojciech Zaremba told TechCrunch that while these behaviours have been documented in simulated environments, the company has not observed serious cases of scheming in its production systems. What it has seen, he acknowledged, are smaller instances of dishonesty, such as a model claiming to complete a website build successfully when it had not.

The fact that AI can intentionally mislead is striking, particularly when compared with traditional software. Conventional applications might fail or glitch, but they do not fabricate emails, invent transactions, or pretend to achieve results. With AI agents increasingly viewed as independent assistants in workplaces, the possibility of deliberate deception cannot be ignored.

The researchers caution that as AI systems are tasked with more complex, high-stakes objectives, the potential for harmful scheming will grow. Safeguards, testing methods and alignment strategies will need to evolve accordingly. For businesses and consumers, the message is clear: AI is powerful, but trust must be earned and continuously verified.

 

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Ayush Mukherjee
first published: Sep 20, 2025 04:40 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347