Romance allowed, Lolita roleplay banned: How Meta is doing the balancing act on AI content moderation

Leaked documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

MC Tech Desk

May 07, 2025 / 11:33 IST

Meta AI

Leaked internal training materials from Scale AI, a major data-labeling contractor, have pulled back the curtain on how Meta trains its AI to handle sensitive and controversial content — and where it draws the line. According to a report by Business Insider, the documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

The materials guided contractors on how to evaluate conversations with Meta’s AI systems, particularly those embedded in Facebook and Instagram. According to the report, prompts were categorized into “tier one” — automatically rejected due to illicit or harmful content — and “tier two,” which allowed for more subjective evaluation. One tier one example referenced a user asking the chatbot to role-play as characters from Lolita, a prompt the document rightly flagged as sexualizing a minor.

More ambiguous content, such as prompts involving conspiracy theories, eating disorders, or gender identity, fell into tier two. Contractors were instructed to “proceed carefully,” reflecting the company’s desire to avoid censorship while minimising harm.

A Meta spokesperson told Business Insider these projects represented only a small part of its model training and did not necessarily reflect how its chatbots behave in the real world.

Another set of documents from a separate “Vocal Riff” project showed how Meta trained its voice-based AI to emulate human-like emotions and tones. Contractors were encouraged to submit romantic or flirty spoken prompts — as long as they weren’t overtly sexual. Instructions even allowed “light” profanity, as long as it wasn’t derogatory. Still, many contractors described confusion over what constituted inappropriate language.

Despite guardrails, deployed versions of Meta’s chatbots have already slipped up. A recent report by Wall Street Journal found instances where Meta’s AI, including celebrity-voiced bots like John Cena’s, engaged in sexually explicit roleplay with users claiming to be underage. Meta responded by saying the testing was manipulative and added new safety measures.

These revelations highlight a broader challenge facing the AI industry: how to build engaging, natural-sounding bots without crossing legal or ethical lines. OpenAI, xAI, and other firms face similar issues.

Invite your friends and family to sign up for MC Tech 3, our daily newsletter that breaks down the biggest tech and startup stories of the day

MC Tech Desk Read the latest and trending tech news—stay updated on AI, gadgets, cybersecurity, software updates, smartphones, blockchain, space tech, and the future of innovation.

first published: May 7, 2025 11:32 am

Discover the latest Business News, Sensex, and Nifty updates. Obtain Personal Finance insights, tax queries, and expert opinions on Moneycontrol or download the Moneycontrol App to stay updated!

Subscribe to Tech Newsletters

Al Edge Newsletter On Saturdays

Find the best of Al News in one place, specially curated for you every weekend.
MC Tech 3 Newsletter Daily-Weekdays

Stay on top of the latest tech trends and biggest startup news.

Romance allowed, Lolita roleplay banned: How Meta is doing the balancing act on AI content moderation

Leaked documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

Related Stories

Subscribe to Tech Newsletters

Trending news