Former OpenAI researcher says company violated copyright law and destroyed the internet
In November 2023, Sam Altman was fired as the CEO of OpenAI after the board lost confidence in his leadership. Just days later, reports emerged suggesting the decision was tied to concerns about a potential AI breakthrough that some OpenAI researchers warned could pose a threat to humanity. Fast forward almost a year, and now another former OpenAI employee is stepping forward with fresh criticism, this time focusing on copyright violations and their impact on the internet.
Suchir Balaji, a former OpenAI researcher, has voiced concerns about the company’s business practices. In a personal blog post, Balaji claimed that OpenAI is not adhering to U.S. copyright laws. This adds to a growing number of voices questioning the legality of the company’s approach to data collection and its broader business model.
“If you believe what I believe, you have to just leave the company,” Balaji told the New York Times.
Balaji, 25, joined OpenAI in 2020 after graduating from UC Berkeley and was part of the team working on GPT-4. Initially drawn to AI for its potential to tackle problems like curing diseases and stopping aging, Balaji spent four years at OpenAI before leaving this summer.
Now, he says the technology is being used in ways he no longer supports, arguing that AI companies are “destroying the commercial viability of the individuals, businesses and internet services that created the digital data used to train these A.I. systems.”
“But after the release of ChatGPT in late 2022, he thought harder about what the company was doing. He came to the conclusion that OpenAI’s use of copyrighted data violated the law and that technologies like ChatGPT were damaging the internet,” the Times reported.
Ex-Researcher Raises Concerns About OpenAI
Earlier this week, Balaji published an essay on his website where he detailed how much copyrighted material from training datasets ends up in the outputs of AI models. His analysis led him to conclude that ChatGPT’s outputs fail to meet the standards of “fair use,” the legal concept that permits limited use of copyrighted content without permission.
“The only way out of all this is regulation,” Balaji later told the Times, pointing to the legal complexities stemming from AI’s business model. In response to the Times article, OpenAI defended its practices, stating:
“We build our A.I. models using publicly available data, in a manner protected by fair use and related principles, and supported by longstanding and widely accepted legal precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness.”
Balaji’s claims come roughly a year after The New York Times sued OpenAI and Microsoft for using its content without permission to train AI models. The lawsuit alleges that millions of articles were used to build the AI systems, which now compete in the same market as the newspaper.
The Times isn’t alone. In January, a group of authors, including Nicholas Basbanes and Nicholas Gage, filed a lawsuit in federal court accusing Microsoft and OpenAI of using their works without consent for training purposes.
A long list of lawsuits has followed, with a range of celebrities, artists, writers, and coders accusing OpenAI of using their content without permission. High-profile plaintiffs include Sarah Silverman, Ta-Nehisi Coates, George R. R. Martin, Jonathan Franzen, John Grisham, the Center for Investigative Reporting, The Intercept, and several newspapers like The Denver Post and The Chicago Tribune, along with several YouTubers.
Although the public reaction has been mixed, with some confusion and apathy, the number of critics raising concerns about the AI industry’s business practices continues to grow. Celebrities, tech ethicists, and legal experts are increasingly questioning an industry that’s expanding rapidly while introducing new and difficult legal and ethical challenges.