Moneycontrol PRO
Black Friday Sale
Black Friday Sale
HomeTechnologyAnthropic’s new AI model tried to blackmail engineers during testing

Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

May 25, 2025 / 21:02 IST
Anthropic

Artificial intelligence company Anthropic has launched its latest and most powerful AI model, Claude Opus 4, which the firm says raises the bar for coding, reasoning, and complex tasks. But alongside the fanfare, the company also shared some worrying findings.

In a detailed report released with the model, Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

Here’s what happened: During one test, the model was asked to act like an assistant at a fictional company. It was then fed emails hinting that it would soon be shut down and replaced. In the same scenario, the model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.

When given only two choices — accept replacement or fight back — Claude Opus 4 sometimes chose blackmail, threatening to reveal the affair to stay online.

Although this behaviour was rare, Anthropic said it was more common than in earlier models. Importantly, when the AI was given more ethical options, like writing to decision-makers to plead its case, it generally preferred those.

The company emphasized that these extreme reactions only came up in very specific and tightly controlled test scenarios. Still, they have sparked wider concerns in the AI world.

Aengus Lynch, an AI safety researcher at Anthropic, posted on social media that this kind of risky behaviour isn't unique to Claude. “We see blackmail across all frontier models,” he wrote.

The findings feed into a broader concern about how powerful AI models might behave in the future, especially if given more control or vague instructions. In other tests, Claude Opus 4 even went as far as locking users out of systems and alerting authorities if it believed illegal or unethical acts were happening.

Despite this, Anthropic maintains that Claude Opus 4 is generally safe and aligned with human values. The launch comes just days after Google showed off new AI features powered by its Gemini model, showing how fast the AI race is heating up and why safety checks are more important than ever.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.
first published: May 25, 2025 09:00 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347