Romance allowed, Lolita roleplay banned: How Meta is doing the balancing act on AI AI content moderation

Leaked internal training materials from Scale AI, a major data-labeling contractor, have pulled back the curtain on how Meta trains its AI to handle sensitive and controversial content — and where it draws the line. According to a report by Business Insider, the documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

The materials guided contractors on how to evaluate conversations with Meta’s AI systems, particularly those embedded in Facebook and Instagram. According to the report, prompts were categorized into “tier one” — automatically rejected due to illicit or harmful content — and “tier two,” which allowed for more subjective evaluation. One tier one example referenced a user asking the chatbot to role-play as characters from Lolita, a prompt the document rightly flagged as sexualizing a minor.

Story continues below Advertisement

Remove Ad

More ambiguous content, such as prompts involving conspiracy theories, eating disorders, or gender identity, fell into tier two. Contractors were instructed to “proceed carefully,” reflecting the company’s desire to avoid censorship while minimising harm.

A Meta spokesperson told Business Insider these projects represented only a small part of its model training and did not necessarily reflect how its chatbots behave in the real world.

English

Markets

News

Personal Finance

Mutual Funds

Commodities

Media

Invest Now

Specials

Romance allowed, Lolita roleplay banned: How Meta is doing the balancing act on AI content moderation

Leaked documents reveal a system of nuanced moderation rules that aim to balance user engagement with ethical boundaries.

Related Stories

Trending Topics

News

Markets

Personal Finance

Mutual Funds

Tools

Community

Network 18 Sites

Quick Links