HomeTechnologyCloudflare accuses Perplexity of content scraping; Perplexity fires back at “publicity stunt” claims

Cloudflare accuses Perplexity of content scraping; Perplexity fires back at “publicity stunt” claims

According to Cloudflare, this behaviour bypasses standard protocols like robots.txt and undermines trust between AI services and web publishers.

August 05, 2025 / 10:53 IST
Story continues below Advertisement
Perplexity
Perplexity

Cloudflare has accused Perplexity, an AI-powered search and answer platform, of using hidden and undeclared web crawlers to access website content that was explicitly blocked via site rules. According to Cloudflare, this behaviour bypasses standard protocols like robots.txt and undermines trust between AI services and web publishers.

Cloudflare’s investigation
Cloudflare states that it observed Perplexity evading both robots.txt instructions and network firewalls by utilising undeclared bots that masqueraded as regular web browsers. In tests, Cloudflare created newly registered domains that were invisible to the public, blocked known Perplexity bots, and clearly disallowed crawling in robots.txt.

Story continues below Advertisement

Despite these measures, Perplexity’s platform was able to retrieve and summarise content from those hidden websites. Cloudflare says this was made possible by stealth crawlers using generic browser identities (like Chrome on macOS) and rotating through unlisted IP addresses and network providers to avoid detection.

What is robots.txt?
The robots.txt file is a publicly accessible file placed on a website’s root directory. It acts as a guide for automated bots or web crawlers, telling them which parts of a website they are allowed—or not allowed—to access. Well-behaved bots, such as those from search engines like Google, typically respect these instructions and avoid crawling disallowed paths.