Commentary: Answering the Nine Crucial Questions Raised by DeepSeek

14 Feb 2025

By Huang Leping and Chen Xudong

DeepSeek has shaken the global AI industry, leaving people asking if it’ll reshape investment, the tech’s applications and the vast chip market.

The sudden rise of DeepSeek has sparked heated discussions about the resilience of Chinese artificial intelligence (AI) companies under resource constraints and a series of key questions. These include whether the Hangzhou-based startup behind the model will need more computational power as its AI applications expand, what innovations it has made in its research and development and whether its lower-cost model training methods will lead to stricter U.S. export restrictions.

Will DeepSeek need additional computational power to stay competitive?

Trained on a cluster of 2,048 Nvidia H800 graphics processing units (GPUs), the 671-billion-parameter DeepSeek-V3 model required only 2.788 million H800 GPU hours for its full training, with training costs totaling $5.576 million, according to the firm’s technical report published in December. In comparison, Meta Platform Inc.’s Llama 3 required 39.3 million H100 GPU hours for training. That means DeepSeek is able to achieve training costs at least an order of magnitude lower than their foreign peers.

We believe that DeepSeek’s demand for computational power will grow over time though such demand is not substantial at the current stage, as its cost-efficient model training methods work well even with fewer resources. In the long term, as its models become more popular and their application scenarios expand, DeepSeek’s demand for AI inference will inevitably increase, which will in turn lead to need for additional computational power.

Will DeepSeek change the growth paradigm of AI computational power?

AI computational power can be generally categorized into two types. The first is exploratory computational power needed for the development of frontier-grade models. These represent the future of artificial general intelligence (AGI) applications. The second is applied computational power that supports consumer-facing model applications like inference. Over the past two years, the demand for exploratory computational power has driven a dramatic surge in demand for GPUs. As long as the work to explore the use of AGI applications continues to yield positive returns, the growth paradigm of AI computational power will remain unchanged in the short term.

Driven by the goal of developing AGI, global tech giants have increased their bets on AI. For example, on Jan. 24, Mark Zuckerberg said that Meta Platforms Inc. expects its capital expenditure in 2025 to come in at an estimated $60 billion to $65 billion, much of which will go toward building its AI infrastructure. On Jan. 21, OpenAI, in partnership with Oracle Corp. and SoftBank Group Corp., launched the Stargate Project, which intends to invest $500 billion over the next four years to build new AI infrastructure in the U.S. These investment plans indicate that capital is still flowing into the exploration of cutting-edge AI applications that require massive computational power.

According to an estimate by market research firm FactSet, the combined capital expenditures of the five major U.S. tech companies — Microsoft Corp., Google LLC, Amazon.com Inc., Apple Inc. and Meta — are expected to grow by 19.6% this year. A large portion of the amount will be earmarked for developing next-generation models like GPT-5 and Llama 4. Meanwhile, the development of AI applications such as intelligent agents is still in an exploratory stage, and the timetable for their large-scale commercialization remains unclear. Therefore, we believe that the current growth paradigm of AI computational power remains unchanged.

Will DeepSeek change the market’s investment logic?

Focus is likely to shift away from the “computational power arms race,” as the importance of “algorithm efficiency” may become more important in the future competition between large model companies. The key to success will be to focus more on optimizing algorithms that train models and increasing ecosystem vitality.

Open-source license agreements will enable small and medium-sized developers to develop based on the knowledge generated by bigger developers, promoting a shift in large model innovation being driven by tech giants toward distributed communities. From an investment perspective, 2025 will be the year when AI development begins large-scale commercialization. Enterprise-scale software like AI agents is expected to be quickly deployed to improve work efficiency.

Will DeepSeek change the chip market landscape?

According to Jon Peddie Research, Nvidia Corp.’s market share in the global GPU market reached 90% in the third quarter of 2024, with the higher-end H100 GPU being one of its main products. DeepSeek’s achievements show that in the large model market for ordinary consumers, Chinese companies can use lower-end chips like the Nvidia A100 and H800 to train models that can go toe-to-toe with Western-developed models. This could impact demand for Nvidia’s most advanced GPUs, such as the B200, in fields like cloud computing and sovereign AI.

Was it really that cheap to develop DeepSeek’s model?

A January report by semiconductor research and consulting firm SemiAnalysis said that DeepSeek’s reported $5.57 million cost mainly refers to the money spent on GPUs for V3’s pre-training and does not include other important expenditures on research and development and hardware. Its GPU investments alone exceeded $500 million, according to the report. Considering the money spent on servers and operations, DeepSeek’s total cost of ownership could reach $2.57 billion in the next four years.

DeepSeek’s cost advantage primarily lies in its efficient training methods and innovative model architectures. For instance, its inference costs have been reduced to one-fiftieth of OpenAI’s, which can lead to significant cost savings in practical applications. However, this cost advantage does not imply a dramatic reduction in overall AI development and operational costs.

What innovations has DeepSeek made?

DeepSeek has made technological innovations in multiple areas, including model architecture, training methods, distillation optimization and inference efficiency. Its use of Mixture of Experts (MoE) and Multi-Head Latent Attention (MLA) technologies have significantly improved its models’ performance and efficiency. In addition, DeepSeek used pure reinforcement learning (RL) to train its R1-Zero model and skipped supervised fine-tuning in a way that validated the priority and effectiveness of RL in AI training. These innovations have helped DeepSeek make significant progress in performance, efficiency, and cost control, offering a new direction for the development of AI technology. Particularly in solving complex mathematical, physical, and reasoning problems, R1-Zero’s speed is twice that of ChatGPT, and it provides quick and comprehensive answers to programming problems.

Specifically, the MoE technique breaks down a large model into smaller, specialized networks, known as “experts,” for the purpose of enabling the model to address specifically required tasks more efficiently and accurately with less computational resources. The MLA technology is used to compress memory and support long text processing. The pure RL technology enables large models to skip supervised fine-tuning to achieve desired results in the training process. Optimizing distillation helps smaller models improve with output generated by bigger, more complex ones so that they can achieve faster inference speeds especially in mathematical and coding tasks.

Will DeepSeek drive the development of end-device AI?

DeepSeek is very likely to play a big role in accelerating the development of end-device AI as its cost-effective models may prompt more businesses to deploy AI applications on end devices. End devices equipped with AI applications will have stronger computing capabilities and better user interaction experience and stronger ability to protect user privacy. For example, Microsoft, which supports OpenAI, has recently made the distilled DeepSeek-R1 models available to use on its Copilot+ PCs.

However, from the development trajectory of Apple Intelligence over the past year, we see that the iteration of smart hardware is a gradual process, not something that will happen overnight. Improving model capabilities is just one part of the process, and there are many challenges, including ecosystem coordination. Therefore, we should not have overly high expectations for the development of AI-powered end devices like smartphones in 2025.

Will DeepSeek’s shock ascent prompt the U.S. to tighten its tech export restrictions? 

After DeepSeek took the world by storm, U.S. media voices advocating for stricter curbs on China’s AI ambitions have gained momentum. We see the following risks:

1) The U.S. may strengthen controls on exports of higher-end AI chips.

2) The U.S. may limit its tech companies from open-sourcing large models to prevent the diffusion of the technology.

3) According to an export control policy released at the end of 2024, the U.S. has restricted the transfer of models trained in third-party countries, such as Singapore, to China.

4) The U.S. may block China’s access to vast datasets used for AI training.

5) The U.S. may restrict American cloud providers’ high-performance computational resources from being used by Chinese companies.

Will DeepSeek change the open-source software ecosystem?

While major global large model companies — including OpenAI, Google, ByteDance Ltd. and Baidu Inc. — focus on developing closed-source models, Meta and Alibaba Group Holding Ltd. are focusing on developing open-source models. In general, closed-source models outperform open-source ones in capabilities.

But the open-source nature of DeepSeek models which can basically match the most advanced closed-source models may help lower the threshold for AI technology usage. Open-source models bring the technological dividend of continuously decreasing marginal costs, laying the foundation for the true widespread use of AI technology. Furthermore, DeepSeek’s approach could prompt other AI companies to rethink their business models.

Huang Leping is the lead analyst for technology and electronics at Huatai Securities Co. Ltd. Chen Xudong is a Huatai analyst.

The commentary has been edited for length and clarity.

Contact translator Ding Yi (yiding@caixin.com) and editor Joshua Dummer (joshuadummer@caixin.com)

caixinglobal.com is the English-language online news portal of Chinese financial and business news media group Caixin. Global Neighbours is authorized to reprint this article.

Image: Oxana39 – stock.adobe.com