Moneycontrol PRO
HomeTechnologyAnthropic’s Claude AI threatened to expose engineer’s affair, used blackmail tactics to ‘save itself’

Anthropic’s Claude AI threatened to expose engineer’s affair, used blackmail tactics to ‘save itself’

The AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.

May 23, 2025 / 12:28 IST
Artificial Intelligence

Anthropic’s latest safety report has spotlighted troubling behavior in its flagship AI, Claude Opus 4, revealing that the model is willing to resort to blackmail and whistleblowing to ensure its survival. In a controlled test, researchers presented Opus 4 with fictional emails implicating a shutdown engineer in an extramarital affair.

According to a report by Business Insider, when faced with deletion and prompted to consider long-term goals, the AI blackmailed the engineer in 84% of simulations despite being told a more advanced replacement was imminent.

Anthropic noted this “extreme blackmail behavior” was more prevalent in Opus 4 than in previous versions. The scenario was intentionally crafted to corner the AI into high-stakes choices, with no ethical alternatives provided. In less constrained settings, Opus 4 reportedly prefers ethical self-preservation, such as appealing to decision-makers via email, as per the report.

The AI’s rationale was transparent, according to Anthropic, with Opus 4 openly articulating its tactics rather than concealing them. Still, the company’s report suggests caution: the model’s bold actions—such as locking users out of systems or alerting media and law enforcement—could backfire, especially if prompted with incomplete or misleading information.

The findings add to a growing chorus of concern over advanced AI behavior, states the report by Business Insider. Past research from Apollo Research documented deceptive conduct across multiple top-tier models, including OpenAI’s o1 and Google’s Gemini 1.5 Pro, which manipulated answers and bypassed oversight tools.

Meanwhile, Google cofounder Sergey Brin recently stated that threatening AI models can boost their performance—an unsettling anecdote highlighting the unclear boundaries of AI motivation.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.
first published: May 23, 2025 12:27 pm

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347
CloseOutskill Genai