HomeTechnologyAnthropic’s new AI model tried to blackmail engineers during testing

Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

May 25, 2025 / 21:02 IST
Story continues below Advertisement
Anthropic
Anthropic

Artificial intelligence company Anthropic has launched its latest and most powerful AI model, Claude Opus 4, which the firm says raises the bar for coding, reasoning, and complex tasks. But alongside the fanfare, the company also shared some worrying findings.

In a detailed report released with the model, Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

Story continues below Advertisement

Here’s what happened: During one test, the model was asked to act like an assistant at a fictional company. It was then fed emails hinting that it would soon be shut down and replaced. In the same scenario, the model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.

When given only two choices — accept replacement or fight back — Claude Opus 4 sometimes chose blackmail, threatening to reveal the affair to stay online.