Anthropic study reveals ChatGPT-style models can be hacked 'quite easily'

A new study by Anthropic, the company behind Claude and a key AI partner for Microsoft’s Copilot, has revealed a serious security flaw at the heart of modern artificial intelligence. The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

Small data, big damage

Story continues below Advertisement

Remove Ad

The researchers found that it takes just 250 poisoned files to compromise a model’s behaviour. When these files slip into training data, they can create “backdoors” that trigger erratic or misleading responses. For instance, a model could be made to generate gibberish or fake information when it encounters a specific token like <SUDO>.

Crucially, the vulnerability applies across model sizes — from smaller systems to massive 13-billion-parameter models. The assumption that bigger models are naturally more resilient to corruption doesn’t hold up. What matters is the presence of those poisoned files, not the total volume of data.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

Anthropic study reveals ChatGPT-style models can be hacked 'quite easily'

The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links