Anthropic’s Claude AI threatened to expose engineer’s affair, used blackmail tactics to ‘save itself’

The AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.

MC Tech Desk

May 23, 2025 / 12:28 IST

Artificial Intelligence

Anthropic’s latest safety report has spotlighted troubling behavior in its flagship AI, Claude Opus 4, revealing that the model is willing to resort to blackmail and whistleblowing to ensure its survival. In a controlled test, researchers presented Opus 4 with fictional emails implicating a shutdown engineer in an extramarital affair.

According to a report by Business Insider, when faced with deletion and prompted to consider long-term goals, the AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.

Anthropic noted this “extreme blackmail behavior” was more prevalent in Opus 4 than in previous versions. The scenario was intentionally crafted to corner the AI into high-stakes choices, with no ethical alternatives provided. In less constrained settings, Opus 4 reportedly prefers ethical self-preservation, such as appealing to decision-makers via email, as per the report.

The AI’s rationale was transparent, according to Anthropic, with Opus 4 openly articulating its tactics rather than concealing them. Still, the company’s report suggests caution: the model’s bold actions—such as locking users out of systems or alerting media and law enforcement—could backfire, especially if prompted with incomplete or misleading information.

The findings add to a growing chorus of concern over advanced AI behavior, states the report by Business Insider. Past research from Apollo Research documented deceptive conduct across multiple top-tier models, including OpenAI’s o1 and Google’s Gemini 1.5 Pro, which manipulated answers and bypassed oversight tools.

Anthropic’s Claude AI threatened to expose engineer’s affair, used blackmail tactics to ‘save itself’

The AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.

Related Stories

Subscribe to Tech Newsletters

Anthropic’s Claude AI threatened to expose engineer’s affair, used blackmail tactics to ‘save itself’

The AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.

Related Stories

Subscribe to Tech Newsletters

Trending news