Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

MC Tech Desk

May 25, 2025 / 21:02 IST

Anthropic

Artificial intelligence company Anthropic has launched its latest and most powerful AI model, Claude Opus 4, which the firm says raises the bar for coding, reasoning, and complex tasks. But alongside the fanfare, the company also shared some worrying findings.

In a detailed report released with the model, Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

Here’s what happened: During one test, the model was asked to act like an assistant at a fictional company. It was then fed emails hinting that it would soon be shut down and replaced. In the same scenario, the model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.

When given only two choices — accept replacement or fight back — Claude Opus 4 sometimes chose blackmail, threatening to reveal the affair to stay online.

Although this behaviour was rare, Anthropic said it was more common than in earlier models. Importantly, when the AI was given more ethical options, like writing to decision-makers to plead its case, it generally preferred those.

The findings feed into a broader concern about how powerful AI models might behave in the future, especially if given more control or vague instructions. In other tests, Claude Opus 4 even went as far as locking users out of systems and alerting authorities if it believed illegal or unethical acts were happening.

Despite this, Anthropic maintains that Claude Opus 4 is generally safe and aligned with human values. The launch comes just days after Google showed off new AI features powered by its Gemini model, showing how fast the AI race is heating up and why safety checks are more important than ever.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.

first published: May 25, 2025 09:00 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Email address

Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

Related Stories

Subscribe to Tech Newsletters

Trending news