Reddit accuses Perplexity of stealing content to train AI

Reddit logo on smartphone screen

Reddit claims it caught Perplexity doing something it shouldn’t have.

The popular message board website filed a lawsuit against Perplexity, a notable AI firm, alleging that Perplexity engaged in improper data scraping to feed its AI program. The complaint (courtesy of The Verge) lists Perplexity alongside three data scraping firms: AWMProxy, Oxylabs, and SerpApi. According to Reddit, Perplexity does business with at least one of these companies, allegedly using them to get data from Reddit without the site’s permission.

While Reddit has signed agreements with other AI companies in the recent past, it has not done so with Perplexity. Reddit claims that it once sent a cease-and-desist letter to Perplexity for scraping Reddit content. Per Reddit’s complaint, after the letter was sent, Perplexity started citing Reddit even more than before, not less. Where this really gets juicy is how Reddit claims it caught Perplexity in the alleged act of stealing data. In Reddit’s words:

“To confirm this hypothesis, Reddit created a “test post” – the equivalent of a digital “marked bill” – that could only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet. Within hours, queries to Perplexity’s “answer engine” produced the contents of that test post. The only way that Perplexity could have obtained that Reddit content and then used it in its “answer engine” is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine.”

Perplexity provided a statement defending itself to The Verge.

“Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge,” the company told The Verge. “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

We’ll have to wait and see how the lawsuit pans out, but at least Reddit’s tactic for allegedly catching Perplexity in the act is funny, if nothing else.

Leave a Reply

Your email address will not be published. Required fields are marked *