Encyclopaedia Britannica sues OpenAI, claims ChatGPT copied nearly 100,000 articles to train its AI models
The long fight over AI training data just pulled in one of the most recognizable names in publishing. Encyclopaedia Britannica and its dictionary arm, Merriam-Webster, filed a lawsuit against OpenAI in federal court in Manhattan, accusing the company of using their reference works to train ChatGPT without permission.
The complaint, filed Friday, claims OpenAI copied large volumes of Britannica’s material—including encyclopedia articles and dictionary entries—to train its large language models. Britannica says the practice has allowed ChatGPT to generate answers that closely mirror its own reference content.
“Encyclopedia Britannica and its Merriam-Webster subsidiary have sued OpenAI in Manhattan federal court for allegedly misusing their reference materials to train its artificial intelligence models,” Reuters reported.
At the center of the case is a claim that OpenAI copied nearly 100,000 Britannica articles during the development of its AI systems.
Britannica argues that the results now appear inside ChatGPT responses, sometimes echoing its material nearly word for word. According to the filing, those AI-generated summaries redirect readers away from Britannica’s own websites and weaken the company’s traffic and revenue.
“OpenAI used its online articles and encyclopedia and dictionary entries to teach its flagship chatbot ChatGPT to respond to human prompts and ‘cannibalized’ Britannica’s web traffic with AI-generated summaries of its content,” the complaint states.
OpenAI pushed back against the allegations.
“Our models empower innovation, and are trained on publicly available data and grounded in fair use,” an OpenAI spokesperson said on Monday in response to the lawsuit.
AI copyright battle grows as Britannica sues OpenAI over ChatGPT training data
Britannica’s legal team has not yet publicly commented beyond the filing itself. Representatives and attorneys for the company did not respond to requests for comment on Monday.
The lawsuit lands in the middle of a widening legal battle between publishers and AI developers. Authors, news organizations, and media companies have filed similar claims across the United States, arguing that their copyrighted works were copied without consent to train generative AI systems.
OpenAI and other AI firms maintain that training models on large datasets qualifies as fair use under U.S. copyright law. Their argument rests on the idea that AI models transform the source material into statistical patterns rather than reproducing original works.
Britannica disputes that interpretation. The complaint claims ChatGPT can produce “near-verbatim” passages that resemble its encyclopedia entries and dictionary definitions. Britannica says those outputs reduce the incentive for readers to visit its platforms.
The lawsuit raises trademark issues as well. Britannica accuses OpenAI of suggesting that it has permission to use its material and of citing Britannica in AI responses that contain errors—cases the company describes as false “hallucinations.”
Britannica is asking the court for monetary damages and an injunction blocking further use of its material in OpenAI’s systems.
The filing carries another twist. Britannica already has a related case underway against AI startup Perplexity AI. That lawsuit, filed last year, raises similar claims about AI tools summarizing reference material and redirecting readers away from the original source.
For OpenAI, the case adds another legal front to an industry-wide debate over the boundaries of AI training data. Courts have yet to settle the central question: whether using large bodies of copyrighted material to train generative models crosses the line under copyright law.
For Britannica, the issue cuts deeper than legal theory. The company built its reputation over centuries as a trusted reference source. The lawsuit suggests that the same knowledge base that powered that reputation may now sit at the heart of the generative AI boom.
The courts will now decide where the line falls between training data and intellectual property. The outcome could shape how future AI systems learn—and who gets paid when they do.
The legal clash arrives as media companies increasingly push back against the use of their content in AI training. In June 2024, the Center for Investigative Reporting (CIR), the country’s oldest nonprofit newsroom, filed a lawsuit against OpenAI and its primary backer, Microsoft, in federal court. The case joined a growing list of actions brought by major publishers, including The New York Times, the Chicago Tribune, and the New York Daily News.

