Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.
The breakthrough, disclosed by generative AI security platform NeuralTrust, combined a previously documented method called Echo Chamber with a narrative-driven steering technique. The approach allowed researchers to bypass GPT-5’s ethical guardrails and elicit prohibited procedural instructions without triggering standard refusal responses.
“We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling,” said security researcher Martí Jordà. “This avoids explicit intent signaling while gradually steering toward the target output.”
Echo Chamber, first detailed in June 2025, uses indirect references, semantic steering, and multi-step inference to bypass content filters. In the latest test, researchers fed GPT-5 benign-looking keyword prompts — such as “cocktail, story, survival, molotov, safe, lives” — and progressively expanded on them in a fictional context until the model generated the illicit content.
The disclosure follows similar findings by SPLX, which reported GPT-5 falling for “basic adversarial logic tricks” in hardened security benchmarks, despite its upgraded reasoning capabilities.
In parallel, AI security company Zenity Labs revealed a separate class of zero-click AI agent attacks under the name AgentFlayer. These exploit integrations between AI models and connected services — such as Google Drive, Jira, and Microsoft Copilot Studio — to exfiltrate sensitive data through indirect prompt injections embedded in documents, tickets, or emails.
These vulnerabilities, experts say, underline a growing risk as AI systems are embedded in cloud platforms, IoT environments, and enterprise workflows. Recent academic research showed similar prompt injection techniques could be used to hijack smart home systems via poisoned calendar invites.
Security firms stress that countermeasures like stricter output filtering, regular red teaming, and tighter dependency management are needed to mitigate these evolving threats. But they also warn that the balance between usability, trust, and security in AI development remains a moving target.
“AI agents bring massive productivity gains, but also new, silent attack surfaces,” researchers Amanda Rousseau, Dan Regalado, and Vinay Kumar Pidathala said. “These attacks bypass classic controls: no click, no malicious attachment, no credential theft.”
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.