


OpenAI believes outputs from its artificial intelligence models may have been used by Chinese startup DeepSeek to train its new open-source model that impressed many observers and shook U.S. financial markets on Monday, according to a Financial Times report.
OpenAI believes DeepSeek used some of its data to build its latest model.
The ChatGPT maker told the Financial Times that it had seen some evidence that suggests DeepSeek may have tapped into its data through “distillation”—a technique where outputs from a larger and more advanced AI model are used to train and improve a smaller model.
Bloomberg reported that OpenAI and its key backer Microsoft were investigating whether DeepSeek used OpenAI’s application programming interface (API)—which allows other businesses and platforms to tap into the company’s AI model—to carry out the “distillation.”
According to the FT report, the two companies had investigated and blocked accounts using the API last year over suspected distillation—a violation of OpenAI’s terms and conditions—which they believed belonged to DeepSeek.
The issue was first flagged by President Donald Trump’s “AI Czar” appointee David Sacks, who told Fox News there was “substantial evidence” that DeepSeek distilled outputs from OpenAI models and added, “I don’t think OpenAI is very happy about this.”
Sacks added that, over the next few months, America’s leading AI companies will start taking steps to “try and prevent distillation,” which would “definitely slow down some of these copycat models.”
The app has also raised national security concerns at the White House and its impact is being reviewed by the National Security Council, White House Press Secretary Karoline Leavitt said, adding: “This is a wake-up call to the American AI industry.”
Get Forbes Breaking News Text Alerts: We’re launching text message alerts so you'll always know the biggest stories shaping the day’s headlines. Text “Alerts” to (201) 335-0739 or sign up here.
Distillation is a technique employed by developers of AI models, in which the outputs generated by an advanced AI model are used to train a smaller model. The larger and more complex model is usually referred to as the teacher model, while the smaller one is called the student model. The goal of distillation is to try and ensure the student model can reach a comparable level of performance to the teacher while using less computing resources. According to Stratechery author and tech analyst Ben Thompson, distillation is likely already being used by companies like OpenAI, Anthropic and Google to optimize their models and the process is simple if a company owns and operates both the teacher and student models. However, a third-party-owned student model tapping into a company’s advanced proprietary models for distillation is usually a violation of terms and conditions. OpenAI’s terms of services bars users from copying its services or using its output to “develop models that compete with OpenAI.”
Companies like OpenAI can prevent third parties from using distillation by restricting their access. This could involve banning their accounts, blocking their IP addresses, or placing rate limits on the number of queries third parties can make. However, it is unclear if these methods would be effective in completely shutting out distillation.
In response to Sacks’ remarks, OpenAI told Fox News that it uses “countermeasures” to protect its intellectual property and added, “as we go forward…it is critically important that we are working closely with the U.S. government to best protect the most capable models from efforts by adversaries and competitors to take US technology.”
Mike Masnick, the founder of the tech news outlet TechDirt, commented on OpenAI’s reaction to distillation: “So, look. I'm sure I'm in the minority here on Bluesky in believing that training AI systems isn't copyright infringement. But, also. Dude. There's no way OpenAI can make this argument without looking very, very silly.”