A new class action lawsuit accuses ChatGPT creator OpenAI of criminally scraping data from all over the internet, then using the stolen data to create its popular automated products. The lawsuit, filed this week by the Clarkson Law Firm in a Northern California court, is only the latest in a slew of legal challenges that strike at the very heart of the influential startup’s business model.
Since it pivoted from a humble research organization to a for-profit business in 2019, OpenAI has been on a meteoric ascent to the very top of the tech industry. When it launched ChatGPT last November, the company became a household name.
But as OpenAI attempts to stand up its business and lay the groundwork for future expansion, the controversial nature of the technology that it’s selling may sabotage its own ambitions. Given the radicalness and newness of the AI industry, it only makes sense that legal and regulatory issues would develop. And if legal challenges like the one filed this week hold sway, they could undermine the very existence of OpenAI’s most popular products and, in turn, may threaten the nascent AI industry that revolves around them.
The Clarkson lawsuit’s allegations, explained
The central claim in the Clarkson lawsuit is that OpenAI’s entire business model is based on theft. The lawsuit specifically accuses the company of creating its products using “stolen private information, including personally identifiable information, from hundreds of millions of internet users, including children of all ages, without their informed consent or knowledge.”
It’s well known that OpenAI’s large language models—which animate platforms like ChatGPT and DALL-E—are trained on massive amounts of data. Much of this data, the startup has openly admitted, was scraped from the open internet. By and large, most web scraping is legal, though there are some wrinkles to that basic formula. While OpenAI has claimed that everything it does is above board, it has also been repeatedly criticized for a lack of transparency regarding the sources of some of its data. According to this week’s lawsuit, the startup’s hoovering practices are blatantly illegal; specifically, the suit accuses the company of violating multiple platforms’ terms of service agreements while also running afoul of various state and federal regulations—including privacy laws.
Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft. They systematically scraped 300 billion words from the internet, “books, articles, websites and posts – including personal information obtained without consent.” OpenAI did so in secret, and without registering as a data broker as it was required to do under applicable law
The lawsuit also highlights the fact that, after OpenAI freely exploited everybody’s web content, it then proceeded to use that data to build commercial products that it is now attempting to sell back to the public for exorbitant sums of money:
Without this unprecedented theft of private and copyrighted information belonging to real people, communicated to unique communities, for specific purposes, targeting specific audiences, the [OpenAI] Products would not be the multi-billion-dollar business they are today.
Whether the U.S. justice system ends up agreeing with the lawsuit’s definition of theft is yet to be determined. Gizmodo reached out to OpenAI for comment on the new lawsuit but did not hear back.
OpenAI’s legal troubles are piling up
The Clarkson lawsuit isn’t the only one that OpenAI is currently dealing with. In fact, OpenAI has been subjected to an ever growing list of legal attacks, many of which make similar arguments.
Just this week, another lawsuit was filed in California on behalf of numerous authors who say their copyrighted works were scraped by OpenAI in its effort to gobble up data to train its algorithms. The suit, again, basically accuses the company of stealing data to fuel its business—and says it created its products by “harvesting mass quantities” of copyrighted works without “consent, without credit, and without compensation.” It goes on to characterize platforms like ChatGPT as being “infringing derivative works”—essentially implying that they wouldn’t exist without the copyrighted material—“made without Plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act.”
At the same time, both the Clarkson suit and the authors’ suit bare some resemblance to another lawsuit that was was filed shortly after ChatGPT’s release last November. This one, filed as a class action lawsuit by the offices of Joseph Savari in San Francisco, accuses OpenAI and its funder and partner Microsoft of having ripped off coders in an effort to train GitHub Copilot—an AI driven virtual assistant. The lawsuit specifically accuses the companies of failing to adhere to the open source licensing agreements that undergird much of the development world, claiming that they instead lifted and ingested the code without attribution, while also failing to adhere to other legal requirements. In May, a federal judge in California declined OpenAI’s motion to have the case dismissed, allowing the legal challenge to move forward.
In Europe, meanwhile, OpenAI has faced similar legal inquiries from government regulators over its lack of privacy protections for users’ data.
All of this legal turmoil takes place against the backdrop of OpenAI’s meteoric ascent to Silicon Valley stardom—a precarious new position that the company is clearly fighting to maintain. As the company fends off legal assaults, OpenAI’s CEO, Sam Altman, has been attempting to influence how new laws will be built around his axis-shifting technology. Indeed, Altman has been courting governments all over the globe in an effort to lay the groundwork for a friendly regulatory environment. The company is clearly positioned to be the de facto leader in the AI industry—if it can fend off the ongoing challenges to its very existence, that is.
Trending Products