HomeTechnologyGPT-5 Jailbreak: Security researchers managed to bypass 'security' with just few prompts; here's what at risk

GPT-5 Jailbreak: Security researchers managed to bypass 'security' with just few prompts; here's what at risk

Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.

August 13, 2025 / 10:30 IST
Story continues below Advertisement
ChatGPT
ChatGPT

Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.

The breakthrough, disclosed by generative AI security platform NeuralTrust, combined a previously documented method called Echo Chamber with a narrative-driven steering technique. The approach allowed researchers to bypass GPT-5’s ethical guardrails and elicit prohibited procedural instructions without triggering standard refusal responses.

Story continues below Advertisement

“We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling,” said security researcher Martí Jordà. “This avoids explicit intent signaling while gradually steering toward the target output.”

Echo Chamber, first detailed in June 2025, uses indirect references, semantic steering, and multi-step inference to bypass content filters. In the latest test, researchers fed GPT-5 benign-looking keyword prompts — such as “cocktail, story, survival, molotov, safe, lives” — and progressively expanded on them in a fictional context until the model generated the illicit content.