Moneycontrol
HomeTechnologyReddit restricts Internet Archive access over data scraping concerns

Reddit restricts Internet Archive access over data scraping concerns

Reddit is limiting the Internet Archive’s ability to index its content after AI companies scraped data from the Wayback Machine, restricting archival access to only the homepage. 

August 12, 2025 / 11:52 IST
Story continues below Advertisement
Reddit

Reddit has announced it will block the Internet Archive’s Wayback Machine from indexing the majority of its content following incidents where AI companies scraped Reddit data via the archive. Going forward, the Wayback Machine will no longer be able to crawl Reddit’s post detail pages, comments, or user profiles. It will be limited to indexing only the Reddit.com homepage, effectively restricting archived data to popular headlines and posts on any given day.

A Reddit spokesperson, Tim Rathschmidt, explained that while the Internet Archive serves the open web, some AI companies have violated platform policies by scraping data through the Wayback Machine. Reddit is therefore restricting access to protect user privacy and uphold its content policies, including the removal of deleted material.

Story continues below Advertisement

These restrictions began to take effect recently, with Reddit notifying the Internet Archive in advance. Rathschmidt also noted ongoing concerns about scraping activity originating from the Internet Archive in the past.

Reddit has a history of limiting access to scraping tools amid increasing misuse by AI firms. The company offers data access to those willing to pay, having established agreements with Google for search and AI training data. In 2023, Reddit blocked major search engines from crawling its data unless paid and implemented controversial API changes that led to the shutdown of some third-party apps due to their abuse in AI model training.