GPT-5 Jailbreak: Security researchers managed to bypass 'security' with just few prompts; here's what at risk

Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.

The breakthrough, disclosed by generative AI security platform NeuralTrust, combined a previously documented method called Echo Chamber with a narrative-driven steering technique. The approach allowed researchers to bypass GPT-5’s ethical guardrails and elicit prohibited procedural instructions without triggering standard refusal responses.

Story continues below Advertisement

Remove Ad

“We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling,” said security researcher Martí Jordà. “This avoids explicit intent signaling while gradually steering toward the target output.”

Echo Chamber, first detailed in June 2025, uses indirect references, semantic steering, and multi-step inference to bypass content filters. In the latest test, researchers fed GPT-5 benign-looking keyword prompts — such as “cocktail, story, survival, molotov, safe, lives” — and progressively expanded on them in a fictional context until the model generated the illicit content.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

GPT-5 Jailbreak: Security researchers managed to bypass 'security' with just few prompts; here's what at risk

Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links