- The VA Group
- Posts
- AI Data Wars: Reddit vs. Perplexity Exposes the New Digital Battleground
AI Data Wars: Reddit vs. Perplexity Exposes the New Digital Battleground
What Reddit vs. Perplexity reveals about ownership, openness, and the price of “free” data.

BEYOND VIRTUAL
Not long ago, the internet was built on the idea that information should flow freely. Anyone could read, share, remix, and build upon what others had created. But that was before AI entered the chat and started quietly feeding on everything we’ve ever posted.
Now, the question of who owns the web’s knowledge has turned into a courtroom drama. Reddit has sued Perplexity AI, claiming the startup scraped millions of posts to train its AI systems without permission. Perplexity fired back, calling Reddit’s move a corporate power play - a way to control the same “public data” that made the platform valuable in the first place.
What counts as public in a world where every post and comment can be harvested for machine learning? When does open data become exploited content?
Somewhere between Reddit’s demand for consent and Perplexity’s defense of curiosity lies the balance not yet found.
For business owners, creators, and anyone building with AI, this conflict raises a deeper question: Can innovation still be ethical when it relies on the voices of others?
Feature Story
When Sharing Turns Into Stealing

What is really happening with Reddit and Perplexity?
Reddit claims that Perplexity scraped user-generated posts without their permission or payment, using them to power its AI responses. The company says this isn’t just a data issue; it’s about fairness and respect for the millions of people who make Reddit what it is. Every comment, review, and thread represents hours of unpaid human thought.
Perplexity, on the other hand, insists it has done nothing wrong. It argues that it’s simply helping people discover information that’s already public, not training massive AI models, but summarizing existing conversations and linking back to the source. To Perplexity, it’s no different from a journalist quoting a Reddit post or a blogger linking to a discussion thread.
However, the argument extends beyond copyright law. It’s a clash of values. On one side is the belief that the internet’s greatest power lies in sharing knowledge freely. On the other is, the idea is that creativity, even in the form of casual online posts, has worth, and shouldn’t be mined without consent.
The business side adds another layer. Reddit has begun licensing its data to major AI firms like Google and OpenAI, turning once-free community discussions into a new source of revenue. Smaller AI companies argue that this new “data economy” locks them out, and that if every piece of knowledge comes with a price tag, innovation itself will slow down.
And somewhere in the middle are the users, the real authors of the internet’s collective wisdom. The people who wrote thoughtful replies, asked hard questions, and shared personal experiences that made platforms like Reddit valuable in the first place.
Now, their words are being negotiated over, repackaged, and sold. The irony is that the internet was built on openness and sharing, but the very platforms that championed those ideals are now putting up walls.
Visionary Voices
Jaron Lanier, Scientist at Microsoft, Calls for Data Dignity

When music went digital two decades ago, platforms like Napster promised freedom for everyone to listen anytime. But that freedom came with a cost. Artists saw their work shared across the internet faster than ever before while earning almost nothing from it. It wasn’t just about piracy; it was about fairness.
Today, that same tension has found a new stage in Artificial intelligence. Instead of songs, it is people’s words, conversations, and ideas being shared, indexed, and summarized by machines.
Jaron Lanier, a scientist at Microsoft and one of the early pioneers of virtual reality, has spent years arguing for what he calls data dignity. His idea is simple: people deserve recognition and respect for the information they create, especially when it fuels the technology others profit from. Lanier does not reject AI; he warns against building it on invisible labor.
Researchers like Timnit Gebru and Fei-Fei Li echo that concern. They remind us that AI systems are not self-born geniuses. They learn from us, from human photos, text, code, and emotion, all of which reflect real human effort. Gebru has spoken about the need for transparency around data sources, while Li advocates for what she calls human-centered AI, technology that serves people rather than quietly consuming their work.
Seen from this lens, Reddit’s lawsuit is not just a legal dispute, and Perplexity’s defense is not just a tech argument. It reflects a bigger struggle over value and voice. The internet was built to be open, but that openness now powers billion-dollar algorithms.
The Trend
The Cost of Scraping

For years, the internet has been described as an open frontier - a place where information is free. But as AI gets hungrier for data, that freedom now comes with a price tag.
Data scraping used to sound harmless, almost invisible. A few lines of code quietly collect what’s already public. But when billions of posts, photos, and reviews are scraped at once, something changes. The digital commons turns into an unmarked mine of human thought.
Reddit’s lawsuit against Perplexity is part of a growing wave of resistance. News outlets, creative platforms, and even individual artists are beginning to ask: if AI systems learn from our work, shouldn’t there be some acknowledgment, or at least permission?
At the same time, smaller AI companies worry that if every data source becomes paywalled, innovation will slow to a crawl. The internet that once allowed anyone to build could turn into a gated community reserved for the biggest players, the ones who can afford the licenses.
How do we find the delicate balance between fostering innovation and censoring the very ingredient needed to advance it?
What This Means for the Future
The Reddit–Perplexity case may be one lawsuit, but it points to a crossroads for the entire digital world. The internet we built was never designed for AI-scale scraping. It was built for people, messy, creative, curious people who shared because they believed knowledge should spread.
Now, that same generosity has become a resource to be priced and licensed. Companies are rewriting their terms of service, governments are drafting data laws, and AI developers are learning that “public” no longer means “free.”
In the short term, we may see a more fragmented web, parts of the internet locked behind data deals, others still open and chaotic. In the long run, the challenge will be balance: how to protect the human contributors who make online knowledge possible without choking the innovation that keeps it alive.
A Final Note
As we watch the drama between Reddit and Perplexity unfold, we have to ask what kind of precedent we’re setting.
How do we keep the doors open for AI innovation while protecting the very intellectual property that fuels it?
The answer won’t come from courtrooms alone; it will come from the choices we make about how technology learns, who it learns from, and what we believe creativity is worth.
Until next time,
