Moneycontrol
HomeTechnologyAnthropic study reveals ChatGPT-style models can be hacked 'quite easily'

Anthropic study reveals ChatGPT-style models can be hacked 'quite easily'

The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

October 12, 2025 / 11:14 IST
Story continues below Advertisement
Artificial Intelligence

A new study by Anthropic, the company behind Claude and a key AI partner for Microsoft’s Copilot, has revealed a serious security flaw at the heart of modern artificial intelligence. The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

Small data, big damage

Story continues below Advertisement

The researchers found that it takes just 250 poisoned files to compromise a model’s behaviour. When these files slip into training data, they can create “backdoors” that trigger erratic or misleading responses. For instance, a model could be made to generate gibberish or fake information when it encounters a specific token like <SUDO>.

Crucially, the vulnerability applies across model sizes — from smaller systems to massive 13-billion-parameter models. The assumption that bigger models are naturally more resilient to corruption doesn’t hold up. What matters is the presence of those poisoned files, not the total volume of data.