HomeTechnologyRomance allowed, Lolita roleplay banned: How Meta is doing the balancing act on AI content moderation

Romance allowed, Lolita roleplay banned: How Meta is doing the balancing act on AI content moderation

Leaked documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

May 07, 2025 / 11:33 IST
Story continues below Advertisement
Meta AI
Meta AI

Leaked internal training materials from Scale AI, a major data-labeling contractor, have pulled back the curtain on how Meta trains its AI to handle sensitive and controversial content — and where it draws the line. According to a report by Business Insider, the documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

The materials guided contractors on how to evaluate conversations with Meta’s AI systems, particularly those embedded in Facebook and Instagram. According to the report, prompts were categorized into “tier one” — automatically rejected due to illicit or harmful content — and “tier two,” which allowed for more subjective evaluation. One tier one example referenced a user asking the chatbot to role-play as characters from Lolita, a prompt the document rightly flagged as sexualizing a minor.

Story continues below Advertisement

More ambiguous content, such as prompts involving conspiracy theories, eating disorders, or gender identity, fell into tier two. Contractors were instructed to “proceed carefully,” reflecting the company’s desire to avoid censorship while minimising harm.

A Meta spokesperson told Business Insider these projects represented only a small part of its model training and did not necessarily reflect how its chatbots behave in the real world.