我們認爲從Intelligence Per Token到Inlligence Per Task的模型變化是重要方向。
1)訓練階段,
大模型訓練預計繼續遵循“Scaling Law”。雲、創業公司、主權AI將繼續大量採購算力並建設數據中心,追求繼續提升Intelligence Per Token,預計全球訓練集羣將持續擴容,在訓練集羣超過10萬片GPU後,對數據中心的建造、部署、供電、互聯構成新挑戰。計算密度、互聯密度、功率散熱密度與存儲密度將有望快速提升。
We believe the step change in computing from token to task-oriented performance represents an important development direction.
Training:
during the training stage, we expect large language models (LLM) will heed the scaling law. In this regard, cloud, startup and sovereign AI businesses would continue to purchase large amounts of computing power and build data centers, striving to increase computing intelligence with heavier token consumption. Training clusters around the world would keep growing larger, but once they surpass 100,000 GPUs in size, we believe new challenges will arise for data center construction, deployment, power supply and interconnectivity, as density requirements increase greatly for computing performance, interconnectivity, power heat dissipation and storage.
2)推理階段,
我們認爲樹狀搜索/自博弈等方式有望快速提升Intelligence Per Token,我們看好在計算階段GPU與CPU的高速互聯。以及在應用階段,我們認爲模型新架構帶來的規劃能力逐步提高;數學、代碼與通用能力也將逐步提升。
Inference:
we believe tree-based search in self-play and other methods will improve rapidly with token-driven intelligence:
•At the computation stage, we are bullish about high-speed interconnectivity between GPUs and CPUs.
•At the application stage, we believe planning capabilities of new model architectures will improve over time, accompanied by improvements in math, coding and general skills.