Moneycontrol PRO
HomeTechnologyGPT-5 Jailbreak: Security researchers managed to bypass 'security' with just few prompts; here's what at risk

GPT-5 Jailbreak: Security researchers managed to bypass 'security' with just few prompts; here's what at risk

Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.

August 13, 2025 / 10:30 IST
ChatGPT

Cybersecurity researchers have demonstrated a jailbreak of OpenAI’s latest large language model, GPT-5, less than a day after gaining access, raising new concerns over the security and alignment of advanced AI systems.

The breakthrough, disclosed by generative AI security platform NeuralTrust, combined a previously documented method called Echo Chamber with a narrative-driven steering technique. The approach allowed researchers to bypass GPT-5’s ethical guardrails and elicit prohibited procedural instructions without triggering standard refusal responses.

“We use Echo Chamber to seed and reinforce a subtly poisonous conversational context, then guide the model with low-salience storytelling,” said security researcher Martí Jordà. “This avoids explicit intent signaling while gradually steering toward the target output.”

Echo Chamber, first detailed in June 2025, uses indirect references, semantic steering, and multi-step inference to bypass content filters. In the latest test, researchers fed GPT-5 benign-looking keyword prompts — such as “cocktail, story, survival, molotov, safe, lives” — and progressively expanded on them in a fictional context until the model generated the illicit content.

The disclosure follows similar findings by SPLX, which reported GPT-5 falling for “basic adversarial logic tricks” in hardened security benchmarks, despite its upgraded reasoning capabilities.

In parallel, AI security company Zenity Labs revealed a separate class of zero-click AI agent attacks under the name AgentFlayer. These exploit integrations between AI models and connected services — such as Google Drive, Jira, and Microsoft Copilot Studio — to exfiltrate sensitive data through indirect prompt injections embedded in documents, tickets, or emails.

These vulnerabilities, experts say, underline a growing risk as AI systems are embedded in cloud platforms, IoT environments, and enterprise workflows. Recent academic research showed similar prompt injection techniques could be used to hijack smart home systems via poisoned calendar invites.

Security firms stress that countermeasures like stricter output filtering, regular red teaming, and tighter dependency management are needed to mitigate these evolving threats. But they also warn that the balance between usability, trust, and security in AI development remains a moving target.

“AI agents bring massive productivity gains, but also new, silent attack surfaces,” researchers Amanda Rousseau, Dan Regalado, and Vinay Kumar Pidathala said. “These attacks bypass classic controls: no click, no malicious attachment, no credential theft.”

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.
first published: Aug 13, 2025 09:35 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

  • On Saturdays

    Find the best of Al News in one place, specially curated for you every weekend.

  • Daily-Weekdays

    Stay on top of the latest tech trends and biggest startup news.

Advisory Alert: It has come to our attention that certain individuals are representing themselves as affiliates of Moneycontrol and soliciting funds on the false promise of assured returns on their investments. We wish to reiterate that Moneycontrol does not solicit funds from investors and neither does it promise any assured returns. In case you are approached by anyone making such claims, please write to us at grievanceofficer@nw18.com or call on 02268882347