Anthropic’s new AI model tried to blackmail engineers during testing

Artificial intelligence company Anthropic has launched its latest and most powerful AI model, Claude Opus 4, which the firm says raises the bar for coding, reasoning, and complex tasks. But alongside the fanfare, the company also shared some worrying findings.

In a detailed report released with the model, Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

Story continues below Advertisement

Remove Ad

Here’s what happened: During one test, the model was asked to act like an assistant at a fictional company. It was then fed emails hinting that it would soon be shut down and replaced. In the same scenario, the model was also shown messages suggesting that the engineer responsible for shutting it down was having an extramarital affair.

When given only two choices — accept replacement or fight back — Claude Opus 4 sometimes chose blackmail, threatening to reveal the affair to stay online.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

Anthropic’s new AI model tried to blackmail engineers during testing

Anthropic admitted that during internal safety tests, Claude Opus 4 occasionally suggested extremely harmful actions, including blackmail, when it believed its “survival” was under threat.

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links