Anthropic study reveals ChatGPT-style models can be hacked 'quite easily'

The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

MC Tech Desk

October 12, 2025 / 11:14 IST

Artificial Intelligence

A new study by Anthropic, the company behind Claude and a key AI partner for Microsoft’s Copilot, has revealed a serious security flaw at the heart of modern artificial intelligence. The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

Small data, big damage

The researchers found that it takes just 250 poisoned files to compromise a model’s behaviour. When these files slip into training data, they can create “backdoors” that trigger erratic or misleading responses. For instance, a model could be made to generate gibberish or fake information when it encounters a specific token like <SUDO>.

Crucially, the vulnerability applies across model sizes — from smaller systems to massive 13-billion-parameter models. The assumption that bigger models are naturally more resilient to corruption doesn’t hold up. What matters is the presence of those poisoned files, not the total volume of data.

Why this matters

As AI models such as Claude and ChatGPT continue to power everyday tools — from office assistants to coding copilots — this research highlights a looming risk. Attackers don’t need supercomputers or insider access to cause chaos; a few poisoned data points can be enough to undermine trust in entire AI systems.

If such vulnerabilities go unchecked, enterprises relying on AI for critical work like financial analysis or document review could face subtle but devastating sabotage. The study makes one thing clear: securing the AI training pipeline is no longer optional — it’s essential to prevent the next generation of models from being silently corrupted.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.

first published: Oct 12, 2025 11:13 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Anthropic study reveals ChatGPT-style models can be hacked 'quite easily'

The research, done in collaboration with the UK AI Security Institute and The Alan Turing Institute, shows how easily large language models (LLMs) can be poisoned with malicious data — no massive hacks required.

Related stories

Subscribe to Tech Newsletters

Trending news