Reddit has restricted the Internet Archive's Wayback Machine from archiving its site content, except for the homepage, due to concerns about AI companies scraping Reddit data without adhering to its terms of service. This move aims to safeguard user privacy and preserve removed content. Reddit is taking broader measures to control data access, including modifying APIs, negotiating paid licenses, and pursuing legal action against unauthorized data collectors. The Internet Archive community awaits further updates on the long-term implications for internet preservation.
Reddit has implemented significant restrictions on the Internet Archive's Wayback Machine, limiting its ability to archive the platform's content except for the homepage. This move is aimed at addressing concerns over AI companies scraping Reddit data without adhering to its terms of service, particularly regarding user privacy and the preservation of removed content [1].
The Internet Archive, a non-profit organization dedicated to preserving the open web, will no longer be able to index Reddit's post detail pages, comments, or user profiles. Instead, it will only be able to archive the Reddit homepage, providing insights into popular news headlines and posts of the day [1].
Tim Rathschmidt, a spokesperson for Reddit, explained that while the Internet Archive serves an important role in preserving the open web, instances of AI companies violating platform policies by scraping data from the Wayback Machine have necessitated these restrictions. He stated, "Until they’re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content), we’re limiting some of their access to Reddit data to protect redditors" [1].
These restrictions come as part of a broader trend of Reddit tightening its controls over data access. In 2022, Reddit reached an agreement with Google for the use of its data for search and AI purposes, subsequently implementing a paywall for major search engines that wish to crawl its content [3]. Additionally, Reddit has taken legal action against unauthorized data collectors, such as suing Anthropic for allegedly continuing to scrape content even after assurances that such behavior would cease [3].
The impact of these restrictions on the Internet Archive and the broader digital landscape remains to be seen. While the move aims to safeguard user privacy and preserve removed content, it also raises questions about the long-term implications for internet preservation [1].
References:
[1] https://www.theverge.com/news/757538/reddit-internet-archive-wayback-machine-block-limit
[2] https://www.morningstar.com/news/pr-newswire/20250811dc47709/investor-alert-pomerantz-law-firm-reminds-investors-with-losses-on-their-investment-in-reddit-inc-of-class-action-lawsuit-and-upcoming-deadlines-rddt
[3] https://news.ssbcrack.com/reddit-to-block-internet-archive-from-indexing-most-of-its-content-amid-ai-scraping-concerns/
[4] https://www.globenewswire.com/news-release/2025/08/09/3130537/673/en/RDDT-FINAL-DEADLINE-ROSEN-A-LONGSTANDING-LAW-FIRM-Encourages-Reddit-Inc-Investors-with-Losses-in-Excess-of-100K-to-Secure-Counsel-Before-Important-August-18-Deadline-in-Securities-.html
Comments
No comments yet