专栏名称: CaixinGlobal财新国际
Read Caixin to know China better!
目录
相关文章推荐
51好读  ›  专栏  ›  CaixinGlobal财新国际

【双语阅读】本周封面:DeepSeek爆火,撼动AI投资和算力竞争底层逻辑(上)

CaixinGlobal财新国际  · 公众号  ·  · 2025-02-11 18:12

正文


DeepSeek, a Chinese artificial intelligence (AI) startup, has sent shockwaves through the global tech industry, triggering a massive sell-off in chip stocks and igniting political tensions in Washington. DeepSeek’s stunning efficiency—achieving OpenAI-tier performance with a fraction of the compute power—has Silicon Valley on edge. As its app rockets to No. 1 on the Apple and Google app stores, Wall Street and Washington scramble to respond, marking what could be the biggest challenge yet to America’s AI supremacy.

一家中国人工智能(AI)公司横空出世,引发美国资本市场剧震,金融机构争相大幅抛售芯片公司股票,让全球最大上市公司——英伟达的市值一夜之间缩水近五分之一,史所罕见。


Backed by Hangzhou-based private equity firm High-Flyer, DeepSeek released its new-generation open-source AI model, DeepSeek-V3, on Dec. 26, followed by a consumer-facing app and its advanced reasoning model, R1, in January. These models directly rival OpenAI’s most sophisticated AI systems but require far fewer resources.

Developers were stunned by DeepSeek’s technical report, which shows capabilities on par with OpenAI’s leading models through highly efficient algorithms. Ordinary users flocked to the platform—while OpenAI’s latest ChatGPT model costs $20 per month, DeepSeek is free.

总部位于中国杭州的私募机构幻方量化旗下大模型公司DeepSeek(深度求索),于2024年12月26日发布新一代开源预训练大模型DeepSeek-V3,并在2025年1月15日推出面向普通用户的App,随后又在1月20日发布开源模型R1,对标OpenAI最先进的推理模型o1。

开发者惊奇于DeepSeek在技术报告中展示出的创新,以高超算法实现了和OpenAI最先进模型一较高下的能力;普通用户也纷纷“倒戈”——若要使用ChatGPT最新模型需要交付20美元的月费,而DeepSeek免费。


DeepSeek’s meteoric rise was fueled by its groundbreaking AI capabilities and disruptive efficiency. DeepSeek achieves these results with just 2,048 Nvidia GPUs, a fraction of the resources OpenAI required. DeepSeek reported a total training cost of $5.58 million and consumed less than 3 million GPU hours, compared with over 30 million GPU hours by Meta’s Llama-3.1-405B.

多名专业人士对财新分析,DeepSeek-V3的能力与GPT-4o相近,而R1亦达到与o1相匹敌水平。然而,DeepSeek-V3训练过程仅需2048张英伟达先进GPU(AI芯片),但OpenAI则需上万张先进GPU训练模型;DeepSeek-V3的技术论文称其训练GPU成本仅为557.6万美元,所用GPU小时仅为278.8万,大幅低于Meta训练开源模型Llama-3.1-405B所用的3084万GPU小时,且相比之下DeepSeek模型能力更强。


The shockwaves from DeepSeek’s emergence erupted across Wall Street. On Jan. 27, Nvidia shares plunged nearly 17%, wiping out $589 billion in market value—the biggest single-day loss in history for a publicly traded company. Other chipmakers, including TSMC, Micron, Broadcom, ARM and ASML, also plummeted as investors questioned whether AI’s insatiable demand for high-end chips might slow.

DeepSeek的影响迅速传导到华尔街。1月27日,英伟达(NASDAQ:NVDA)股价跳水16.86%,收于118.58美元,单日市值蒸发5890亿美元,创全球上市公司单日市值下滑纪录。这也是英伟达自新冠疫情初期2020年3月16日以来的最差表现。


The impact extended beyond financial markets, drawing attention from global AI leaders. At the World Economic Forum in Davos, Switzerland, Microsoft CEO Satya Nadella urged policymakers to take China’s AI advancements seriously. AI pioneer Andrej Karpathy praised DeepSeek’s ability to match OpenAI and Meta’s models with a fraction of the compute power, calling it an impressive display of engineering efficiency.

In Washington, DeepSeek’s rise sparked immediate concern. Bipartisan lawmakers called for stricter AI export controls, warning that China’s rapid progress could erode U.S. technological dominance.

Accusations soon emerged in the American tech community, with some alleging that DeepSeek had used OpenAI’s models for training. Industry experts dismissed these claims, pointing out that AI models are typically trained on vast pools of publicly available data. Others noted that restricting API access to advanced models would do little to prevent their outputs from being used in further training.

一时间“DeepSeek不当使用OpenAI数据训练模型”“DeepSeek模型由OpenAI模型‘蒸馏’而来”等指控,也在美国舆论场发酵。“OpenAI闭源很大程度上就是为了避免被学习,但事实证明,从算力、数据到算法,大模型都没有秘密。美国的先进大模型可以进一步封锁API(应用程序编程接口),但要阻止模型生成的数据被他人用来训练是不可能的。”前述云厂商人士指出。


High-Flyer’s Quiet Empire

High-Flyer, the parent company of DeepSeek, has long operated in the shadows. Founded in 2015 by Liang Wenfeng, a Zhejiang University graduate with expertise in machine vision and financial modeling, the firm pioneered AI-driven quantitative trading in China. By 2021, it managed nearly $14 billion in assets, far ahead of its competitors in AI adoption.

据浙江大学官网消息,梁文锋1985年出生于广东省湛江市,17岁考入浙江大学电子信息工程专业,22岁就读浙江大学信息与通信工程专业研究生,在信通系项志宇教授指导下进行机器视觉方面的研究。2010年,梁文锋获得信息与通信工程硕士学位。

此后梁文锋将目光转向金融市场,2015年成立幻方量化,仅用4年时间,就把基金规模做到过百亿元,到2021年最高峰时管理资产规模接近千亿元。


While most Chinese AI firms scrambled for GPUs after ChatGPT’s launch, High-Flyer had been quietly stockpiling thousands of Nvidia chips since 2019. In 2023, it spun off its AI division to from DeepSeek, focusing exclusively on open-source large language models (LLMs). While the AI industry in China was dominated by internet giants and well-funded startups, DeepSeek remained an outlier. It neither sought external investment nor aggressively expanded its computing power, maintaining a quiet presence even as AI became the country’s most competitive sector.

不同于其他量化机构购买少量英伟达GPU训练模型,幻方量化储备GPU并不仅仅为了量化投资。梁文锋2023年接受采访时表示,2019年幻方量化就已储备了1000张英伟达GPU,主要是被AI能力边界的好奇心驱动。

梁文锋想把幻方做成开源策略平台,到2021年时已投资10亿元配置了超过万张英伟达A100。而国内大模型公司直到2022年底ChatGPT发布引爆市场后,才开始四处收购GPU,但此后美国芯片出口管制逐年加严。


In January 2024, DeepSeek launched China’s first open-source Mixture-of-Experts (MoE) model, a system that routes tasks to specialized smaller models for greater efficiency. However, DeepSeek’s first model barely made a dent in the market.

That changed last May with the release of DeepSeek-V2, which refined the MoE framework and drastically improved inference efficiency. The model triggered an AI price war in China, slashing inference costs to 1% of OpenAI’s rates, forcing companies like ByteDance and Alibaba Cloud to cut their prices. The unprecedented transparency of DeepSeek-V2’s research paper also won widespread respect in the AI community.

2024年1月,DeepSeek推出了国内首个开源MoE(混合专家)模型,即采用了大量的小参数专家模型建模,在调用时将任务分类分配给不同的专家解决。因业界普遍认为2023年3月发布的GPT-4模型采用了该架构,MoE成为行业趋势。

但DeepSeek的首个模型并没有激起市场水花,真正引起业界瞩目的是2024年5月发布的DeepSeek-V2,该模型延续了MoE架构技术路线,并在架构层面进行了创新,从而大幅提高了模型调用时的推理效率,成本也大幅降低。DeepSeek-V2一推出就在国内引发价格战,当时该模型推理定价仅为OpenAI对标模型的百分之一,此后字节跳动、阿里云、智谱AI等主要厂商均跟进降价。DeepSeek-V2论文的开放程度之高,也引起了技术圈“围观”并开始产生敬意。


By 2023, High-Flyer had solidified its status as one of China’s top four quantitative firms, managing more than 60 billion yuan ($8.21 billion) in assets, providing ample resources to fund its AI ambitions. Despite mixed fund performances, the firm’s deep investment in AI set it apart from traditional quantitative trading funds.

While many viewed DeepSeek as an extension of High-Flyer’s financial operations, its trajectory suggests something far more transformative—an AI company born from finance but now challenging the industry’s most dominant players.






请到「今天看啥」查看全文