Cloudflare accused AI search engine Perplexity of stealthily scraping websites, but many defended Perplexity, arguing that accessing a website on behalf of a user should not be treated like a bot. Perplexity used a generic browser to access a website when its crawler was blocked. A Perplexity spokesperson denied the bots were the company's and claimed the behavior was from a third-party service.
Cloudflare, a prominent internet infrastructure provider, has accused AI search engine Perplexity of using stealthy tactics to bypass website restrictions designed to block its bots. The allegations, which have sparked a debate about data ethics and internet norms, highlight the growing tensions between AI companies' data needs and content creators' rights.
According to Cloudflare's research, Perplexity's crawlers initially identify themselves using declared user agents like "PerplexityBot" or "Perplexity-User." However, when faced with restrictions, the company allegedly employs tactics such as impersonating generic browsers, utilizing multiple IP addresses outside their official range, and rotating through different Autonomous System Numbers (ASNs) [1].
Perplexity's spokesperson, Jesse Dwyer, dismissed the allegations as a "sales pitch" and claimed that the screenshots in Cloudflare's post "show that no content was accessed." However, Cloudflare's CEO Matthew Prince strongly condemned Perplexity's actions, comparing them to "North Korean hackers" [2].
This incident underscores the broader trend of AI companies facing scrutiny for their data collection practices. The accusations against Perplexity are part of a larger trend of AI companies facing scrutiny for their data collection practices. Last year, Anthropic faced similar accusations, and Reddit has sued the company for alleged content scraping violations [3].
The economic implications for web publishers are significant. When AI bots gather and summarize content through their own interfaces, it imposes compute costs on the source without offering compensation for the harvested content. TollBit's report reveals alarming ratios of scrapes to referred human site visits for AI companies, with Perplexity showing a ratio of 369:1 [3].
In response to growing concerns, some AI companies have initiated programs to compensate content creators. However, most websites remain excluded from these negotiations. This has led to the emergence of intermediaries like Cloudflare and TollBit, offering technical solutions such as paywalls to help publishers protect their content [2].
As the AI industry continues to evolve, the balance between data access for AI development and content creators' rights remains a critical issue that will shape the future of the internet and AI technologies.
References:
[1] https://techcrunch.com/2025/08/04/perplexity-accused-of-scraping-websites-that-explicitly-blocked-ai-scraping/
[2] https://theoutpost.ai/news-story/perplexity-ai-accused-of-stealth-tactics-to-bypass-web-scraping-restrictions-18627/
[3] https://www.techedt.com/cloudflare-accuses-perplexity-of-bypassing-website-restrictions-using-stealth-tactics
Comments
No comments yet