Did OpenAI steal Deepseek’s code? O3-Mini reasoning in Chinese sparks AI theft controversy
OpenAI’s latest model, O3-mini, has raised eyebrows after users reportedly discovered it was generating responses in Chinese, even when prompted in English. This unexpected behavior, highlighted by X (formerly Twitter) user Vikhyat Rana, has fueled speculation that OpenAI may have borrowed from Deepseek, an open-source AI project specializing in Chinese-language processing.
The accusation surfaced less than a week after OpenAI claimed that DeepSeek had copied its proprietary AI models to train its open-source system using a technique known as ‘distillation,’ which enables a smaller model to mimic the performance of a larger one while using fewer computing resources.
OpenAI Accused of Copying Deepseek’s Code
In response to Rana’s post, an X user named DaiWW claimed that OpenAI’s latest O3-mini model is reasoning in Chinese, suggesting that OpenAI may have used DeepSeek’s open-source code or data without properly refining it before release.
O3-Mini Responded to English Prompt in Chinese
The issue surfaced when Rana shared a screenshot of a query in English that resulted in a response in Chinese. The response contained detailed technical explanations about PyScript and JavaScript handling of audio APIs, leading some to question whether OpenAI integrated or used data from existing Chinese-language datasets without fully reviewing them.
In a post on X, Rana asked, ‘Why O3-mini is reasoning in Chinese 🤨,’ to which OpenAI’s O3-mini responded in Chinese.
“Thought about converting to Python code for 9 seconds
启动 PyScript 探索
使用 PyScript 创建画布类的边框
过滤评估中
在考虑使用 JavaScript 处理音频输入后,我决定写一个 PyScript 假子,借助 HTML canvas 和 Web Audio API,通过 Python 控制声音,然后这可能比较复杂。
融合音频输入和 Python 感觉不难。放大了。通过 Pyodide 的 js 模块,我可以接入 Web 音频 API,定期计算音量并更新播放。
更新播放和处理音频输入是不相干的,可以利用 PyScript 的 js 模块和滚动动画来计算音频偏移并动
Why O3-mini is reasoning in Chinese 🥹 pic.twitter.com/2Z8mNCPrBY
— Vikhyat Rana (@The_Vikhyat) February 1, 2025
Below is DaiWW’s response to Rana’s post:
“The latest OpenAI O3-mini is reasoning in Chinese. It seems OpenAI copied Deepseek’s open-source code and data, releasing O3-mini without proper editing.”
The latest OpenAI O3-mini is reasoning in Chinese,it seems OpenAI copied Deepseek’s open-source code/data and released the O3-mini without carefully edit. pic.twitter.com/tKoOTanhOQ
— DaiWW (@BeijingDai) February 3, 2025
OpenAI’s Silence and the Implications
Meanwhile, OpenAI has yet to address the allegations. The situation highlights broader concerns about transparency in AI model training, ethical use of open-source data, and the risk of unexpected behaviors in AI outputs. If OpenAI sourced its data from Deepseek, it could change how proprietary AI companies interact with open-source communities and raise questions about dataset auditing.
Accusations of Open-Source Code Misuse
Some critics believe OpenAI may have incorporated Deepseek’s publicly available datasets without crediting the source. Deepseek is an open-source AI initiative specializing in Chinese-language models, and the sudden proficiency of O3-mini in this area has fueled speculation about its origins. If OpenAI did use Deepseek’s work, it could raise ethical and intellectual property concerns within the AI research community.
Bigger Questions at Play
This controversy adds to ongoing debates about AI ethics, data sourcing, and corporate responsibility in AI development. Companies like OpenAI continue pushing boundaries in generative AI, but questions about accountability and transparency remain.
As the AI community watches for OpenAI’s response, the key question is whether O3-mini’s Chinese reasoning was an unintended result of its training or a sign of a deeper issue with AI data ethics and intellectual property. The coming days may bring more scrutiny, and OpenAI might need to clarify its data sources to maintain trust. For now, the situation remains unresolved.
Despite the ongoing allegations, Altman acknowledged DeepSeek’s model as a strong competitor and pointed to the need for greater computing power to maintain an edge. In a post on X, he described DeepSeek’s R1 model as “impressive, particularly around what they’re able to deliver for the price.”
The launch of DeepSeek’s cost-efficient V3 model has rattled industry players worldwide in what some have dubbed an “AI Sputnik” moment. Reports suggest that the DeepSeek AI breakthrough model was developed at a cost of under $6 million—an unsettling figure for U.S. tech firms that have invested billions into similar technologies.
According to multiple reports, DeepSeek V3 outperformed leading models like Llama 3.1 and GPT-4o on key benchmarks, including competitive coding challenges on Codeforces. The project was completed on a budget of just $5.5 million—a stark contrast to the hundreds of millions spent by its rivals. This breakthrough challenges the notion that cutting-edge AI development requires an enormous financial investment.