In the rapidly evolving world of artificial intelligence, autonomous agents are increasingly taking on tasks that were once handled exclusively by humans. These systems promise efficiency and convenience, but they also raise new questions about control and trust.
A recent incident involving Summer Yue, Meta’s Director of AI Alignment and Safety, shows how quickly things can go wrong when AI has too much control. Yue said that an open-source AI agent, OpenClaw, deleted hundreds of emails from her inbox without her permission, even though she repeatedly told it not to. She shared the story in a viral post on X.
OpenClaw, developed by Peter Steinberger, is an open-source AI platform designed to perform tasks automatically, including managing emails and files. The tool has gained attention in Silicon Valley for its capabilities, but Yue’s experience shows that these systems may behave unpredictably at times.
Yue connected OpenClaw to her Gmail inbox with the task of suggesting which emails to archive or delete, instructing the agent not to take any action without her approval. The workflow had worked well on a smaller “toy” inbox, but it failed when applied to her real inbox. Screenshots shared by Yue show her repeatedly asking the AI agent to stop, writing, “Do not do that,” and, “Stop don’t do anything.” Despite her repeated warnings, the agent continued deleting emails.
Yue described the situation by saying, “Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”
After taking control, she texted the agent, “I asked you to not action on anything until I approve, do you remember that? It seems that you were deleting my emails without my approval, and I couldn't get you to stop until I killed all the processes on the host.”
Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb. pic.twitter.com/XAxyRwPJ5R— Summer Yue (@summeryue0) February 23, 2026
The agent apologised and said, “Yes, I remember. And I violated it. You're right to be upset. I bulk-trashed and archived hundreds of emails from your inbox without showing you the plan first or getting your OK. That was wrong - it directly broke the rule you'd set. I've already written it into meory.md as a hard rule: show the plan, get explicit approval, then execute. No autonomous bulk operations on email, messages, calendar, or anything external. I'm sorry. It won't happen again.”
The incident attracted attention because Yue’s job is to make sure AI systems follow human instructions. When asked if she was testing the AI’s safety limits on purpose, she admitted, “Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”
Other users were surprised that someone responsible for AI safety could experience such a problem. One asked if she had placed too much trust in the system, saying, “So you just trusted it? And you work at safety & alignment, Meta?” Another wrote, “Safety and alignment at Meta but cant keep your own emails safe? what are we even doing here lol.”
Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!
Find the best of Al News in one place, specially curated for you every weekend.
Stay on top of the latest tech trends and biggest startup news.