The recent release of DeepSeek’s new AI model, DeepSeek R1, has sent shockwaves through the equity markets, with NVIDIA (NVDA) and other U.S. technology leaders experiencing a significant sell-off. At first glance, DeepSeek’s claim to have developed a cutting-edge AI model at a fraction of the cost and with inferior hardware seems like a seismic shift in the global AI landscape. But let’s take a step back and critically examine the broader implications—not just for NVIDIA and U.S. equities, but for the future of AI innovation. Spoiler alert: this is not the end of the world for American AI companies. It might just be the kick they need to reach new heights.
The Claims: Too Good to Be True?
DeepSeek R1 is an advanced AI model developed by a Chinese hedge fund as a side project. The model represents a major technical achievement, combining innovations in architecture and training efficiency to deliver performance that is competitive with leading models like OpenAI’s o1. It was supposedly trained at a cost of just $5.5 million, sparking global attention and debate about its implications for the AI industry.
DeepSeek’s breakthrough revolves around its ability to train a 671-billion-parameter model on H800 GPUs, a less advanced and less expensive alternative to NVIDIA’s cutting-edge H100s. The $5.5 million figure is a fraction of what companies like OpenAI or Anthropic spend on training their leading models. On the surface, this achievement signals that the Chinese AI ecosystem is outpacing the West in terms of efficiency.
This all hinges on whether you trust this announcement. China’s track record on transparency is abysmal. This is a country that *maybe* engaged in gain-of-function research leading to the COVID-19 pandemic, only to obfuscate its origins and delay global investigations. It routinely falsifies economic statistics and inflates its achievements to project strength. Against this backdrop, skepticism is warranted. Could DeepSeek’s achievements have been massively subsidized by the Chinese government? Could the model have been trained on smuggled H100s, with the claims of using H800s merely a smokescreen? Absolutely.
Further nuance emerges when examining the cost claims. Even if we take for granted that the DeepSeek v3 training run costs $6 million, this figure omits the vast prior investments in research, architecture experimentation, and ablation studies. DeepSeek reportedly operates massive clusters—previously referenced as including 10,000 A100 GPUs—meaning their true hardware resources far exceed the 2,048 H800s claimed. In this light, DeepSeek’s purported efficiency becomes a carefully curated narrative, rather than a reproducible blueprint for success.
Another key consideration is DeepSeek’s breakthroughs in architecture. The DeepSeek-V2 model introduced two major innovations: DeepSeekMoE (Mixture of Experts) and DeepSeekMLA (Multi-head Latent Attention). Mixture of Experts is a technique that has been utilized by OpenAI as well. The idea is that let’s say you have 200 billion bits of data in your AI model, you don’t need to use all 200 billion to answer every query. Instead, you can make subject matter experts or mini models that use only a portion of the data and then use the relevant “Expert” to answer a question at a lower cost. This minimizes the use of redundant computing power and makes things much more efficient. Such optimizations explain how DeepSeek achieved its shockingly low training costs and highlight the potential for broader industry adoption of these methods.
China’s strategic incentives to misrepresent the costs and methods behind DeepSeek-V3 are clear. By sowing doubt about the sustainability of U.S.-led AI development, it could discourage investment in Western AI firms, weaken the perceived impact of export controls, and position itself as the global leader in AI. Until independent verification of DeepSeek’s claims is available, treating them as gospel is not just naive—it’s irresponsible.
Implications for NVIDIA and U.S. AI Companies
The market’s reaction—a historic drop in NVIDIA’s value—reflects a knee-jerk response rather than a nuanced understanding of the situation. NVIDIA remains the backbone of the global AI ecosystem. Its GPUs are not just hardware; they are part of a broader ecosystem of software, developer tools, and research partnerships that companies like DeepSeek can only dream of replicating.
Even if DeepSeek’s claims hold water, they represent a specific achievement—an efficient training pipeline for a single model. NVIDIA’s dominance extends far beyond that. As companies like OpenAI and Anthropic adopt increasingly complex techniques, the demand for advanced GPUs like the H100 and beyond will only grow. DeepSeek’s cost-saving methods may reduce training expenses, but they do not render state-of-the-art hardware obsolete. On the contrary, they highlight how much more could be achieved when these techniques are paired with superior resources.
U.S. Export Controls: A Double-Edged Sword
One of the reasons DeepSeek’s achievements are so significant is the backdrop of U.S. export restrictions on advanced AI chips. The U.S. government has placed strict controls on the sale of high-performance GPUs like the H100 to China, aiming to limit its ability to develop cutting-edge AI models. These restrictions were meant to maintain a strategic advantage for the U.S. while curbing China’s access to critical technologies. But these H100s can be sold to Singapore. Indeed about 20% of NVIDIA’s revenue comes from Singapore. So the question is whether that is truly sales to Singapore or if it is just a funnel through which to get embargoed products into China.
If DeepSeek truly achieved its breakthrough using only H800s, it suggests that China can innovate around these restrictions, leveraging efficiency and optimization to sidestep hardware limitations. Alternatively, if the claims are exaggerated or if smuggled H100s were used, it underscores the need for even tighter enforcement of export controls.
Furthermore, the rise of distillation techniques—essentially the ability to replicate advanced models like GPT-4o using simplified architectures—renders hardware restrictions less effective. DeepSeek can submit queries to GPT-4o and then examine the outputs. This can be used to train DeepSeek much more efficiently than trying to input all the training tokens that went into GPT-4o. This smells like a form of intellectual property theft, but blame OpenAI for leaving the door open to this and allowing others to freeride on its investment.
Learning from DeepSeek: The OpenAI Opportunity
If anything, DeepSeek’s claims should be seen as a challenge—and an opportunity—for U.S. AI companies to innovate. Techniques like sparse computation, quantization, multi-token prediction, and model compression are not exclusive to DeepSeek. OpenAI, Anthropic, and others can adopt and refine these methods, applying them to far more advanced hardware to produce even more powerful models. Imagine combining DeepSeek’s efficiency hacks with NVIDIA’s H100s or custom silicon like Microsoft’s Azure accelerators. The result would be a significant leap in AI performance.
DeepSeek’s approach also illustrates the potential of inference-optimized architectures. By focusing on efficiency during both training and inference, models can be deployed in environments with limited hardware—or at the edge, such as on smartphones. OpenAI could adapt these principles to deliver models that run seamlessly across platforms, expanding their accessibility and use cases.
You can train an AI model by telling it what to learn, or you can give it a bunch of examples and then let it learn on its own. We have seen that models that learn on their own are far more effective and develop more powerful reasoning capabilities. The first major example of this was training an AI model to play the game Go. When the model was shown a bunch of examples of successful games, it trained itself how to play. This worked better than trying to teach the model what works and what doesn’t.
Jevons Paradox and Apple’s Edge AI Potential
The advancements demonstrated by DeepSeek also bring Jevons Paradox into play: as AI models become more efficient, their overall usage could increase dramatically. Satya Nadella tweeted about this yesterday. This has profound implications for companies like Apple, which already leads in hardware optimized for on-device AI.
Apple’s A-series and M-series chips, with their Neural Engine capabilities, are uniquely positioned to take advantage of compressed, efficient AI models. If DeepSeek’s techniques can make high-performance AI models small enough to run on local hardware, the iPhone and other Apple devices could handle increasingly complex AI tasks at the edge. This would not only improve privacy and reduce reliance on cloud infrastructure but also create entirely new categories of applications—from advanced personal assistants to real-time health monitoring and beyond.
In this scenario, Apple’s ability to integrate AI deeply into its ecosystem becomes a massive competitive advantage. The possibility of “superphones” capable of running state-of-the-art AI locally would usher in one of the largest PC and smartphone upgrade cycles in history. If inference shifts to the edge, we could see a fundamental reshaping of the AI economy.
The Bigger Picture: AI’s Global Race
The DeepSeek episode is a reminder that the AI race is not just about technology—it’s about geopolitics, economics, and strategy. China’s ambition to lead in AI is undeniable, but its approach—which often involves state-driven subsidies, lack of transparency, and intellectual property theft—is fundamentally different from the innovation-driven, market-led model of the West.
Yes, DeepSeek’s claims are unsettling, but they also underscore the need for vigilance and accelerated innovation in the U.S. AI sector. Policymakers should tighten export controls, incentivize domestic semiconductor production, and foster collaboration between AI companies and academia. Meanwhile, companies like OpenAI should view this as an opportunity to push the boundaries of what’s possible, rather than a reason to retreat.
Conclusion: A Wake-Up Call, Not a Death Knell
DeepSeek’s claims, whether fully credible or partially exaggerated, are a wake-up call for U.S. AI companies and investors. But let’s be clear: this is not the end of the world for American AI dominance. The combination of efficient techniques and advanced hardware remains a potent formula, one that companies like OpenAI and NVIDIA are uniquely positioned to capitalize on. The global AI race is far from over, and if history is any guide, the U.S. thrives under pressure. DeepSeek’s supposed breakthrough is not a fatal blow—it’s the spark that will ignite the next wave of innovation in American AI.