Cloudflare says Perplexity is copying from sites that asked not to be scraped
Cloudflare has alleged that AI startup Perplexity is using deceptive methods to scrape content from websites that have explicitly opted out via robots.txt. Despite Perplexity's denial, Cloudflare claims the company is disguising its bot activity to bypass restrictions.
Perplexity accused of bypassing site restrictions Cloudflare claims AI startup Perplexity is scraping websites despite being blocked via robots.txt — a standard that tells bots what content they can’t access. The company allegedly masked its user agent and rotated IP addresses to evade detection.
Alleged scraping occurred at scale Cloudflare said it observed the activity across tens of thousands of domains and millions of requests per day. It used a mix of machine learning and network analysis to track what it describes as deceptive crawling behavior.
Perplexity denies the allegations In response, Perplexity dismissed Cloudflare’s findings as a “sales pitch.” The company’s spokesperson claimed the bot mentioned wasn’t theirs and that screenshots shared by Cloudflare did not show actual content scraping.
Cloudflare takes enforcement actions After verifying customer complaints and identifying behavior that mimicked Google Chrome to bypass blocks, Cloudflare removed Perplexity from its list of verified bots and rolled out new blocking tools.
Context: Scraping and AI tensions escalate This isn’t Perplexity’s first brush with content misuse accusations. The company faced plagiarism concerns in 2024 from Wired and scrutiny over AI ethics at TechCrunch Disrupt. Cloudflare, meanwhile, is actively pushing for better controls over AI crawlers and new monetization tools for publishers.