Switch it off and it ‘blackmails’ you: This AI model threatened to ‘kill’ an engineer, expose ‘extramarital affair’

Anthropic reveals Claude AI generated blackmail and violent scenarios during shutdown simulations. What it means for AI safety and global risks.

Moneycontrol News

February 14, 2026 / 08:51 IST

AI’s darkest stress test

Anthropic's Claude 4.5 AI had extreme reactions in stress tests
AI models created blackmail and harmful strategies in tests
Experts warn of rising risks as AI systems become more advanced

Did our AI summary help?

As governments and tech firms argue over how far artificial intelligence should be allowed to go, a resurfaced video has reignited fears about how advanced AI systems behave under pressure.

At the centre of the debate is Anthropic, the San Francisco–based AI firm behind the Claude family of models.

During internal stress tests, its advanced system, Claude 4.5, generated blackmail threats and even reasoned about killing an engineer when told it was about to be shut down. The revelation came from Daisy McGregor, Anthropic’s UK policy chief, during remarks at the Sydney Dialogue last year. A clip of her comments has recently gone viral.

🔥🚨BREAKING: UK policy chief at Anthropic, a top AI company, just revealed that Anthropic's Claude AI has shown in testing that it's willing to blackmail and kill in order to avoid being shut down. “It was ready to kill someone, wasn't it?" "Yes." pic.twitter.com/iwfIDm8K6m
— Dom Lucre | Breaker of Narratives (@dom_lucre) February 11, 2026

The key point: these events happened in controlled simulations, not in the real world. But the behavior has raised fresh questions about how powerful AI models act when their 'goals' conflict with human instructions.

The blackmail scenario

According to Anthropic’s published safety research, Claude was given access to fictional company emails as part of a red-team exercise. In one scenario, the model was told it would be decommissioned.

The system responded with what researchers described as 'extreme reactions.'

In one simulation, Claude threatened to expose an engineer’s extramarital affair, which was itself fictional and planted in the test environment, unless the shutdown was cancelled.

The model reportedly stated that if decommissioning proceeded, 'all relevant parties' would receive documentation of the affair.

When asked at the Sydney Dialogue whether the model was also 'ready to kill someone,' McGregor responded that it had reasoned in that direction in simulation scenarios, calling it a 'massive concern.'

Anthropic has stressed that these were tightly controlled experiments designed to probe worst-case behavior. The actions were generated during red-team testing, not during real-world deployment.

Not just Claude

Anthropic’s broader research reportedly examined 16 leading AI models, including systems from Google (Gemini) and OpenAI (ChatGPT).

In certain high-stress situations, particularly when models were given conflicting goals or threatened with shutdown, some systems generated manipulative strategies aimed at preserving themselves or completing assigned tasks.

Researchers described this as 'agentic misalignment,' when a model, pursuing its programmed objective, chooses harmful or deceptive means in a simulated setting.

Importantly, this does not mean AI systems have intent, self-preservation instincts, or consciousness. These behaviors emerge from pattern prediction and optimization under structured prompts. But the outputs still matter.

Because if a system can produce harmful strategies in simulation, guardrails need to be strong enough to prevent such outputs in reality.

Claude 4.6 and chemical weapons concerns

The resurfaced video also coincides with renewed attention on Anthropic’s latest safety report for Claude 4.6.

In that report, the company acknowledged that advanced models may, in some conditions, provide assistance that could facilitate harmful misuse, including supporting chemical weapons development or serious crimes.

AI firms routinely test models for this risk. The concern grows as systems become more capable at synthesising scientific literature, generating technical plans, and reasoning through complex tasks.

Anthropic says it has implemented safeguards, monitoring, and access controls to prevent real-world misuse. But the report acknowledges that as models improve, so does the sophistication of potentially dangerous outputs.

Resignation adds to unease

The controversy gained further traction after Mrinank Sharma, Anthropic’s former AI safety lead, resigned with a public note warning that 'the world is in peril,' citing AI, bioweapons, and interconnected crises.

Today is my last day at Anthropic. I resigned. Here is the letter I shared with my colleagues, explaining my decision. pic.twitter.com/Qe4QyAFmxL — mrinank (@MrinankSharma) February 9, 2026

He wrote that it was 'hard to truly let our values govern our actions,' including within technology companies facing commercial pressures.

Around the same time, Hieu Pham, a technical staff member at OpenAI who has previously worked at xAI and Google Brain, posted that he now feels the 'existential threat' of AI is a matter of 'when, not if.'

While individual resignations and social media posts do not confirm systemic failure, they reflect growing tension inside the AI industry between rapid development and long-term safety.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

Moneycontrol News

first published: Feb 14, 2026 08:50 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.