top of page
  • Writer's pictureDavid Manion

Reddit Wants to Get Paid for Third-Party Data Access as AI Training Plows Ahead

What do AI training and the internet’s front page have in common? It turns out that Reddit, the popular online platform for discussions on a wide variety of topics, has become a crucial resource for artificial intelligence development.

But as AI developers continue to tap into the rich data mine offered by the site, Reddit has decided it’s time to cash in on its importance to the industry.

From Open API to Paywall: Reddit AI Monetization

Since 2008, Reddit’s API has been a vital resource for developers, allowing them to create tools for managing subreddits, browsing the site, and enhancing search functionality. The API has been fairly open, permitting a range of uses, from educational and research purposes to building moderation tools.

However, Reddit has now opted to introduce a premium access point for third parties requiring more extensive capabilities, higher usage limits, and broader usage rights.

The decision is motivated by a desire to capitalize on the value of its data, especially considering the potential competition that could arise from AI systems replicating Reddit’s conversations.

Reddit’s data comes from the platform’s diverse conversations covering topics as varied as makeup, video games, and power-washing driveways. This data has proved invaluable for training large language models (LLMs), enabling them to generate increasingly coherent and relevant responses to user prompts.

Reddit’s CEO, Steve Huffman, explained in an interview, “The Reddit corpus of data is really valuable, but we don’t need to give all of that value to some of the largest companies in the world for free.”

As the company also gears up for an initial public offering, it is still determining the exact pricing structure for API access and plans to announce the details in the coming weeks.

A Conversational Goldmine for AI Training

AI training has led to the creation of a range of cutting-edge systems, many of which have relied on Reddit’s conversational data. OpenAI’s ChatGPT, Google’s Bard, and Microsoft’s Bing AI have all incorporated Reddit data in their development, showcasing the platform’s undeniable value in the AI landscape.

Other companies, like image hosting service Shutterstock, have also recognized the worth of their data, selling image data to OpenAI for use in their DALL-E project, which generates visual content based on text prompts.

The connection between Reddit and search engines like Google and Microsoft goes beyond AI training. Reddit’s data is crawled and indexed by these engines, boosting the platform’s visibility in search results.

The continuous updates and real-time relevance of Reddit’s data make it a particularly valuable asset for AI training. LLM algorithms, such as those behind advanced chatbots, rely on fresh and authentic conversation data to generate the best possible results.

Huffman emphasizes this point, saying, “More than any other place on the internet, Reddit is a home for authentic conversation. There’s a lot of stuff on the site that you’d only ever say in therapy, or AA, or never at all.”

While Reddit’s decision to charge for API access will impact some, the company has reassured developers that it will continue to offer free access for certain use cases.

Developers building applications to help users engage with Reddit, as well as researchers studying the platform’s data for academic or non-commercial purposes, will still be able to access the API at no cost.

Embracing AI: Reddit’s Machine Learning Ambitions

As Reddit continues to grow in importance, the company is exploring ways to incorporate machine learning into its own operations. Potential applications include identifying users with ai text generators on the platform and notifying users when a comment originates from a bot.

Additionally, Reddit aims to enhance software tools for moderators and support third-party bots that assist in forum monitoring.

As Reddit prepares to implement its new API access fees, the company is sending a clear message to AI developers: it’s time to pay up for the valuable data they’ve been freely using.

Huffman reiterates this sentiment, stating, “Crawling Reddit, generating value, and not returning any of that value to our users is something we have a problem with. It’s a good time for us to tighten things up.”

The new Reddit terms are set to go into effect after a 60-day notice period, following the official email notification sent to developers and third parties.

In short, the decision by Reddit to charge for API access marks a significant turning point in the relationship between the platform and the AI training community. Only time will tell how this move will impact the development of future AI systems and the balance of value between Reddit and the companies that rely on its data.

Disclaimer: This article is provided for informational purposes only. It is not offered or intended to be used as legal, tax, investment, financial, or other advice.


bottom of page