专栏名称: CDCC
数据中心标准、技术沟通交流平台
目录
相关文章推荐
51好读  ›  专栏  ›  CDCC

451 Research | AI热潮与可持续发展目标影响数据中心技术革新

CDCC  · 公众号  ·  · 2025-02-13 12:41

正文

请到「今天看啥」查看全文


AI热潮与可持续发展目标影响数据中心技术革新

AI boom and sustainability goals impact datacenter technology innovation


January 2025

译 者 说
数据中心基础设施指的是支持用于计算、存储和网络的核心 IT 硬件正常运行的非计算设备,例如电力、冷却和机架系统。这种基础设施可能位于超大规模云数据中心、批发或零售托管设施或企业内部数据中心。它可以位于专门建造的或已有的建筑中,并由所有者或合同设施管理提供商运营。数据中心基础设施在确保 IT 设备(无论是通用计算工作负载还是加速计算工作负载)的持续可靠运行以及实现效率和可持续性目标方面发挥着关键作用。观察当前数据中心基础设施的趋势,很显然,人工智能的繁荣正在迅速提高机架功率密度,从而增加冷却需求,并加速液冷技术的采用。同时,可持续性方面的考虑继续推动采用锂离子电池作为柴油发电机的替代品或补充。


关于本报告

About this report

此类报告展示了从各种市场层面的研究输入中得出的见解,这些输入包括财务数据、并购信息以及其它来自标普全球专有的和公开的市场数据源。这些输入与对市场的持续观察以及与供应商和其他关键市场参与者的定期互动相结合。

Reports such as this showcase insights derived from a variety of market-level research inputs, including financial data, M&A information, and other market data sources both proprietary to S&P Global and publicly available. This input is combined with ongoing observation of markets and regular interaction with vendors and other key market players.

本报告特别包含了以下来源的数据。除了内部的“企业之声”调查数据外,我们还与CDCC合作,在中国开展了类似的调查,并将相关数据纳入其中。更多详情请参阅报告末尾的方法论部分。
This report specifically includes data from the following sources. In addition to internal Voice of the Enterprise survey data, we partnered with China Data Center Committee (CDCC) to conduct a similar survey in China and incorporated that data. See the Methodology section at the end of the report for more details.

  • 451 Research 的《企业之声:2024年数据中心基础设施》是一项于2024年 5月和6月间开展的针对熟悉数据中心基础设施技术的企业IT 决策者的全球调查。

  • 451 Research’s Voice of the Enterprise: Datacenter, Datacenter Infrastructure 2024, a global survey of enterprise IT decision-makers familiar with datacenter infrastructure technology, fielded during May and June 2024.

  • 451 Research 的《企业之声: 2024年 数据中心液冷技术》是一项于2024年6月至8月间开展的针对熟悉数据中心液冷技术的企业IT 决策者的全球调查。
  • 451 Research’s Voice of the Enterprise: Datacenter, Liquid Cooling Technology 2024, a global survey of enterprise IT decision-makers familiar with datacenter liquid cooling technology, fielded during June and August 2024.

  • CDCC 的《数据中心行业调查:2024年数据中心基础设施》,该调查于2024年8月和9月期间开展,调查对象为中国熟悉数据中心基础设施技术的企业IT决策者。
  • CDCC’s Datacenter Industry Survey: Datacenter Infrastructure 2024, a survey of enterprise IT decision-makers in China familiar with datacenter infrastructure technology, fielded during August and September 2024.

  • CDCC的《数据中心行业调查:2024年数据中心液冷》,该调查于2024年8月至9月期间开展,调查对象为中国熟悉数据中心液冷技术的企业IT决策者。
  • CDCC’s Datacenter Industry Survey: Datacenter Liquid Cooling 2024, a survey of enterprise IT decision makers in China familiar with datacenter liquid cooling technology, fielded during August and September 2024.


主要发现

Key findings

  • 在可预见的未来,混合IT基础设施仍将持续存在。尽管云技术已被广泛采用(72%),但许多使用云服务的组织也拥有并运营着服务器机房或服务器机柜(44%)以及数据中心(38%)。

  • Hybrid IT infrastructure will persist for the foreseeable future. Even though the cloud is widely adopted (72%), many organizations that use cloud also own and operate server rooms or server closets (44%) as well as datacenters (38%).

  • 各组织正在推进不间断电源(UPS)冗余架构的发展,以平衡可靠性和总体拥有成本(TCO)。仅有29%的受访者表示使用2N或2(N+1)架构,而28%的受访者称其采用分布式冗余(DR)系统,19%的受访者则表示使用IR系统。可持续性促使锂离子电池作为柴油发电机的替代品或补充品得到采用。

  • Organizations are advancing uninterruptible power supply (UPS) redundancy architecture to balance reliability and total cost of ownership (TCO). Only 29% of respondents report using 2N or 2(N+1) architecture, while 28% say they have a distributed redundancy (DR) system and 19% have a catcher system. Sustainability drives the adoption of lithium-ion batteries as alternatives or supplements to diesel generators.

  • 加速计算工作负载越来越受欢迎,使得机架密度不断提高。超过八成(82%)的受访者表示其数据中心运行有加速计算工作负载。机架密度超过10kW的情况十分普遍,未来五年机架密度预计还将普遍上升。
  • Accelerated computing workloads are gaining popularity and driving up rack density. More than four in five survey respondents (82%) say they run accelerated computing workloads in their datacenters. Rack density above 10 kW is prevalent, and density is set to broadly continue rising in the next five years.


  • 空气冷却仍是主流的冷却技术,但液体冷却技术正在迎头赶上。近半数(48%)的受访者表示其系统仅采用空气冷却,38%的受访者表示其系统采用空气冷却和液冷技术相结合的方式,13%的受访者表示其系统仅采用液冷技术。在目前仅使用空气冷却技术的受访者中,56%的人表示其所在组织计划在未来五年内引入液冷技术(13%受访者计划在未来12个月内引入,31%受访者计划在未来2至4内引入,12%计划在未来5年内引入)。
  • Air cooling remains the dominant cooling technology, but liquid cooling is catching up. Just under half (48%) of respondents say they have an system that is solely air cooling, while 38% use a mix of air and liquid cooling and 13% have a purely liquid cooling system. Among respondents who currently use air cooling only, 56% say their organizations plan to introduce liquid cooling in the next five years (13% in the next 12 months, 31% in the next two to four years and 12% in the next five years).

  • 解决更高的机架密度问题仍是液冷技术被提及最多的优点。超过半数(54%)的受访者认为液冷技术能够支持更高的机架密度(与提升GPU功率以及最小化GPU间距以实现集群化相关)是一项优势。其他优点包括在不出现过热的情况下提升服务器功率(45%)、改善PUE(39%)、优化TCO(38%)、运行更安静(32%)以及提升芯片热设计功率(27%)。
  • Addressing higher rack density remains the top cited benefit of liquid cooling. More than half (54%) of respondents cite supporting higher rack density (associated with increasing GPU power and minimizing GPU distance for clustering) as a benefit of liquid cooling. Other benefits include increased server power without overheating (45%), improved power usage effectiveness (39%), optimized TCO (38%), quieter operation (32%) and increased chip thermal design power (27%).

要点

The Take

数据中心基础设施旨在提供尽可能高的可用性和可靠性,同时保持运营效率并满足可持续发展目标。为实现这些目标,数据中心运营商不断探索提高运营效率和可靠性的策略,降低运营成本和碳排放,并为其设施选择最适合的技术和架构设计。除了持续追求可持续性之外,诸如人工智能应用等不断变化的工作负载正在影响数据中心的技术创新。混合IT基础设施显示出稳定的迹象,并且在未来可预见的时间内可能会持续存在。超大规模数据中心运营商继续推进不间断电源(UPS)冗余架构的发展。人工智能的蓬勃发展使得液冷技术成为应对不断增加的机架密度的必要手段,而不再仅仅是提高效率和可持续性的“锦上添花”之举。

Datacenter infrastructure is designed to deliver the highest possible levels of availability and reliability while maintaining operational efficiency and satisfying sustainability targets. To achieve these goals, datacenter operators continually explore strategies to enhance operational efficiency and reliability, minimize costs and carbon emissions, and select the most appropriate technologies and architectural designs for their facilities. In addition to the continuous pursuit of sustainability, changing workloads such as AI applications are impacting the technological innovation of datacenters. Hybrid IT infrastructure shows signs of stability and will likely persist for the foreseeable future. Hyperscalers continue to advance UPS redundancy architecture. The AI boom is making liquid cooling a necessity to address increasing rack density, rather than a “nice to have” to improve efficiency and sustainability.


混合 IT 基础设施呈现出稳定迹象
Hybrid IT infrastructure shows signs of stability

各组织已广泛采用云环境来部署其IT基础设施。然而,出于安全、成本和灵活性等方面的考虑,许多组织仍会分散选择,比如拥有并运营自己的数据中心,或者从托管服务提供商那里租赁空间。在过去两年中,这种组合呈现出稳定迹象。根据451 Research 的《企业之声:数据中心,2024 年数据中心基础设施》调查,最受欢迎的工作负载场所是广义上的云(例如,基础设施即服务、软件即服务、平台即服务、托管私有云),72%的参与组织都选择了这一选项。其中许多组织还拥有并运营服务器机房和服务器机柜(44%)或数据中心(38%),而较小比例的组织拥有设施但将运营外包给设施管理提供商(25%)或使用托管服务(13%)。
Organizations have widely adopted cloud environments to deploy their IT infrastructure. However, for security, cost and flexibility considerations, many organizations still diversify their options, such as by owning and operating datacenters, or leasing space from colocation providers. The mix shows signs of stabilization in the past two years. According to 451 Research’s Voice of the Enterprise: Datacenters, Datacenter Infrastructure 2024 survey, the most popular workload venue by a good margin is broadly cloud (e.g., infrastructure as a service, software as a service, platform as a service, hosted private cloud), cited by 72% of participant organizations. Many of those organizations also own and operate server rooms and server closets (44%) or datacenters (38%), while smaller segments own facilities but contract the operation to a facilities management provider (25%) or use colocation (13%).

图 1:企业维持着多种 IT 环境,其中云环境是主要的工作负载场所
Figure 1: Enterprises maintain a variety of IT environments, with cloud leading workload venues


问题:以下哪些IT环境目前被贵组织用于提供支持?
基础数据:所有受访者(n=735)。

来源:451 Research《企业之声:2024年数据中心基础设施》。
数据中心的架构和技术正在不断发展
Datacenter architecture and technology are evolving

各组织正在推进不间断电源(UPS)冗余架构的发展,以平衡可靠性和总体拥有成本(TCO)。
Organizations are advancing UPS redundancy architecture to balance reliability and TCO

数据中心不间断电源(UPS)冗余架构对于确保关键任务系统的持续供电和减少停机时间至关重要,从而提高可用性和可靠性。冗余选项包括非冗余系统(N)、模块冗余(N+1)、系统冗余(2N)以及更高级的配置,如 2(N+1),这些配置提供了多层备份。随着软件层面冗余技术的成熟,得益于虚拟机和云可用区,运营商开始寻求优化容量利用率并降低总体拥有成本(TCO)的同时保持系统级冗余的方案。特别是受超大规模数据中心的推动,分布式冗余系统越来越受欢迎。在最近的 451 Research 调查中,仅有29%的受访者表示使用2N或2(N+1) 架构,而28%的受访者称拥有DR系统,19%的受访者拥有IR系统。
Datacenter UPS redundancy architecture is essential for ensuring continuous power supply and minimizing downtime for mission-critical systems, thus enhancing availability and reliability. Options for redundancy include non-redundant systems (N), module redundancy (N+1), system redundancy (2N), and more advanced configurations such as 2(N+1), which provide multiple layers of backup. With the maturing of redundancy at the software level, thanks to virtual machines and cloud availability zones, operators started to seek options to optimize capacity utilization and reduce the TCO while maintaining system-level redundancy. Particularly driven by hyperscalers, DR and isolated redundancy (catcher) systems are becoming more popular. In a recent 451 Research survey, only 29% of respondents reported using 2N or 2(N+1) architecture, while 28% say they have a DR system, and 19% have a catcher system.

相比之下,中国的数据中心行业则更为保守,CDCC 调查显示,75%的受访者表示其数据中心采用2N冗余架构,这种架构的成本要高于DR或IR系统。
In comparison, China’s datacenter industry is more conservative, with 75% of respondents to the CDCC survey stating they have 2N redundancy, an architecture that is more expensive than a DR or catcher system.

图 2:各组织采用多种不同的不间断电源冗余架构
Figure 2: Organizations use a diverse selection of UPS redundancy architectures


问:贵组织的数据中心主要采用哪种不间断电源(UPS)架构?

基础数据:参与数据中心基础设施决策或对此有深入了解的受访者。

资料来源:451 Research 《企业之声:2024年数据中心基础设施2024》以及CDCC《数据中心行业调查:2024年数据中心基础设施》。
锂离子电池作为柴油发电机的替代品或补充品,其应用势头持续强劲
Lithium-ion battery adoption continues its momentum as an alternative or supplement to diesel generators

传统上,当市电中断时,数据中心会使用柴油发电机来维持正常运行。随着法规日益严格,越来越多的社区和股东要求企业对其碳排放负责,减少化石燃料的使用已成为一个广受讨论的话题。因此,数据中心运营商面临着用更清洁的替代方案取代柴油发电机的压力。只有13%的受访者(较2023年的21%有所下降)表示将继续使用柴油发电机,且短期内没有计划改用或部署锂离子电池组,而29%的受访者(较2023年的33%有所下降)表示会继续使用柴油发电机,但确保新部署的备用系统采用锂离子电池组(较2023年的33%有所下降)。超过三分之一的受访者(36%,较2023年的33% 有所上升)表示计划在未来五年内从柴油发电机转向锂离子电池组,而20%的受访者(较2023年的11%有所上升)表示其数据中心的所有备用能源系统已全部采用锂离子电池组。
Traditionally, diesel generators are used to maintain datacenter uptime when utility power is down. Reducing the usage of fossil fuels has become a widely discussed topic as regulations are tightened and more communities and shareholders hold businesses accountable for carbon emissions. As a result, datacenter operators are  under pressure to replace diesel generators with cleaner options. Only 13% of survey respondents (down from 21% in 2023) say they will continue using diesel generators with no immediate plans to switch to or deploy lithium-ion battery banks, while 29% (down from 33% in 2023) say they will keep using diesel generators but ensure that newer backup system deployments use lithium-ion battery banks (down from 33% in 2023). More than one-third of respondents (36%, up from 33% in 2023) say they plan to switch from diesel generators to lithium-ion battery banks within the next five years, while 20% (up from 11% in 2023) say they already implement only lithium-ion battery banks in all of their datacenter energy backup systems.

然而,锂离子电池在中国市场的采用速度相对较慢,四分之三的受访者表示计划继续使用柴油发电机,短期内没有更换为锂离子电池的打算。
However, adoption of lithium-ion batteries in the Chinese market is relatively slow, with three-quarters of respondents planning to continue using diesel generators with no immediate plan to switch to lithium-ion batteries.

图 3:在各组织探索电池作为可持续替代能源的同时,柴油发电机仍在使用。
Figure 3: Diesel generators remain in use while organizations explore batteries as sustainable alternative


问:以下哪一项最能描述贵组织对于主要数据中心能源备份系统所采取的方法?

基础数据:参与或了解数据中心基础设施决策的受访者。

资料来源:451 Research 《企业之声:2024年数据中心基础设施》以及CDCC的《数据中心行业调查:2024年数据中心基础设施》。
锂离子电池并非仅用于备用电源
Lithium-ion batteries are not just for backup

数据中心传统上依赖公共电力,仅在停电时使用电池系统,而柴油发电机则作为长时间停电的备用电源。然而,技术进步和成本降低使得锂离子电池成为阀控式铅酸(VRLA)蓄电池的可行替代品。虽然 VRLA 电池的循环次数为300至400次,但锂离子电池的循环次数可达3000至5000次,使用寿命更长。这使得锂离子电池不仅可用于备用电源,还能用于削峰填谷和频率调节,从而提高其利用率,同时又不缩短使用寿命。随着公共电力和电网限制日益增多,以及可持续发展压力不断增大,数据中心运营商正在寻求对其电源的更优控制。由于柴油发电机对环境的影响,它们正受到越来越多的审视,这使得锂离子电池因其强大的放电能力而成为颇具吸引力的替代品。尽管大多数停电会在数小时内解决,但极端天气事件可能导致停电持续数天甚至数周。锂离子电池的备用时间通常为两到四个小时,这足以应对大多数停电情况,并能最大程度减少柴油发电机的运行;不过,对于长时间停电,柴油发电机可能仍有必要。此外,锂离子电池有助于开发混合动力系统,能够整合本地的可再生能源,如风能、太阳能和燃料电池,从而增强电网稳定性,提高运营效率,并进一步减少碳排放。
Datacenters have traditionally depended on utility power, using battery systems only  during outages, with diesel generators as backup for prolonged outages. However, advancements and lower costs have made lithium-ion batteries a viable alternative to valve-regulated lead-acid (VRLA) batteries. While VRLA batteries offer 300 to 400 cycles, lithium-ion batteries provide 3,000 to 5,000 cycles and a longer lifespan. This allows lithium-ion batteries to be used not just for backup but also for peak shaving and frequency regulation, enhancing their utility without sacrificing longevity. As utility and grid constraints become more prevalent and sustainability pressures mount, datacenter operators are seeking improved control over their power sources. Diesel generators face scrutiny due to their environmental impact, making lithium-ion batteries a compelling alternative due to their extensive discharge capabilities. While most outages are resolved within hours, extreme weather events can lead to outages lasting days or even weeks. The backup duration of lithium-ion batteries typically ranges from two to four hours, which is sufficient for most outages and minimizes diesel generator operation; however, diesel generators may still be necessary for extended outages. Furthermore, lithium-ion batteries can facilitate the development of hybrid power systems, enabling the integration of local renewable energy sources such as wind, solar and fuel cells, enhancing grid stability, increasing operational efficiency and further reducing carbon emissions.

人工智能的蓬勃发展正在加速液冷技术的采用
The AI boom is accelerating adoption of liquid cooling

加速计算工作负载越来越受欢迎,从而提高了机架密度
Accelerated computing workloads are becoming more popular and driving up rack density

在过去的18到24个月里,人工智能部署的迅速增长增加了对数据中心更高 IT 机架密度的需求。随着各组织采用人工智能应用,IT 设施被迫在有限的空间内容纳更强大的硬件。这一趋势是由对高性能计算(HPC)的需求所驱动的,以训练复杂的机器学习(ML)模型和处理大量数据。数据中心正在通过使用图形处理单元(GPU)和专门的人工智能加速器来适应这一趋势,在紧凑的设计中实现更高的处理能力。随着人工智能的不断发展,数据中心运营商必须创新其基础设施以满足高级工作负载不断增长的需求,从而进一步提高 IT 机架密度。
The rapid growth of AI deployments in the last 18 to 24 months has increased the demand for higher IT rack density in datacenters. As organizations adopt AI applications, IT facilities are pressed to accommodate more powerful hardware in limited spaces. This trend is driven by the demand for high-performance computing (HPC) to train complex machine learning (ML) models and handle large data volumes. Datacenters are adapting by using graphics processing units (GPUs) and specialized AI accelerators for greater processing power in compact designs. As AI continues to evolve, datacenter operators must innovate their infrastructure to meet the rising demands of advanced workloads, further increasing IT rack density.

超过八成(82%)的调查受访者表示,其数据中心运行有加速计算工作负载。机架功率密度超过10kW的情况十分普遍,并且在未来五年内这一比例很可能继续上升。关于当前机架功率密度与未来五年预计密度的反馈总体上呈现出上升趋势。
More than four in five survey respondents (82%) say they run accelerated computing workloads in their datacenters. Rack density above 10 kW is prevalent and will likely continue to rise in the next five years. Responses regarding rack densities now versus projected densities over the next five years generally demonstrate an upward trend.

图 4:当前及预计支持的机架功率密度
Figure 4: Current and projected rack power density supported


问:考虑到您所在组织的所有数据中心,请问其服务器机架目前支持以下哪些电力容量(即IT负载和设备)? 在接下来的五年里,贵组织数据中心的服务器机架将具备以下哪种功率容量?您打算支持哪些?
基础数据:参与数据中心基础设施决策或对此有深入了解的受访者。

来源:451 Research《企业之声:2024年数据中心基础设施》。


风冷仍是主流技术,但液冷正在迎头赶上
Air cooling remains the dominant technology, but liquid cooling is catching up

虽然电力分配系统将电力输送到数据中心内的 IT 基础设施,但冷却系统在运行过程中散发产生的热量方面发挥着关键作用,以确保最佳性能。几十年来,数据中心主要依赖风冷,从房间级发展到机柜行级,最终发展到机柜级配置,并随着芯片设计的进步导致机柜功率密度的不断上升而提高了精度和效率。尽管液冷技术并非数据中心的新事物,但其应用传统上仅限于高性能计算(HPC)和加密货币挖矿等高密度环境。然而,由于芯片技术的迅速创新,人工智能/机器学习、数据分析和高性能计算的快速发展导致 GPU 芯片密度显著增加。因此,某些应用已接近风冷的热极限,需要探索更高效的液冷解决方案。
While the power distribution system delivers electricity to IT infrastructure within a datacenter, the cooling system plays a pivotal role in dissipating the heat generated during operation to ensure optimal performance. For decades, datacenters have primarily relied on air cooling, evolving from room-level to row-level and ultimately to  rack-level configurations and enhancing precision and efficiency in response to the escalating rack power densities driven by advancements in chip design. Although liquid cooling technologies are not new to datacenters, their application has traditionally been confined to high-density environments such as HPC and cryptocurrency mining. However, rapid advancements in AI/ML, data analytics and HPC, propelled by swift innovation in chip technology, are resulting in significantly increased GPU chip densities. Consequently, certain applications are approaching the thermal limits of air cooling, necessitating the exploration of alternative liquid cooling solutions that offer superior efficiency.

在他们的主要数据中心中,48% 的受访者表示采用的是完全风冷系统,38% 的受访者表示采用的是风冷和液冷相结合的系统,13% 的受访者表示采用的是纯液冷系统。在仅使用风冷的受访者中,56% 表示计划在未来五年内引入液冷系统(见图 5)。
In their primary datacenters, 48% of respondents say they have an entirely air-cooled system, while 38% have a mix of air cooling and liquid cooling, and 13% have a purely liquid cooling system. Among respondents whose organizations use only air cooling, 56% say they have plans to introduce liquid cooling in the next five years (see Figure 5).

图 5:风冷仍占主导地位,而液冷的采用率在上升
Figure 5: Air cooling is still dominant, while liquid cooling adoption rises

问题:以下哪一项最能描述贵组织主要数据中心当前部署的冷却系统类型?

基础数据:参与或了解数据中心基础设施决策的受访者(n=219)。
问:以下哪一项最能描述贵组织将其引入主要数据中心的计划(如果有)?

基础数据:其主要数据中心仅使用风冷系统的受访者(n=100)。
来源:451 Research《企业之声:2024年数据中心液冷技术》。
转向液冷需要对现有架构进行改造或者彻底更新,这两种方式都存在技术难题和经济影响。在现有系统无法满足运行需求之前,继续使用它们是明智之举。评估处理器散热性能的一个关键指标是热设计功耗(TDP),它表示在理论负载条件下以W为单位的最大功耗。目前的CPU热设计功耗约为350W,下一代产品将达到500W至600W。虽然对于基于CPU的通用计算,风冷仍能发挥作用,但随着散热需求的增加,可能需要改进气流管理,从而导致空间需求增大和风扇噪音增加。相比之下,GPU 的热设计功耗要高得多,目前的产品在900W至1000W之间,下一代产品将达到 1200W,由于产生的热量密集,空气冷却已无法满足需求。因此,液冷对于基于GPU的人工智能应用来说变得至关重要。
Transitioning to liquid cooling necessitates either retrofitting existing architectures or implementing complete overhauls, both of which present technical challenges and economic implications. It is sensible to continue using existing systems until they can no longer meet operational demands. A key metric for assessing the thermal performance of processors is thermal design power (TDP), which indicates the maximum power consumption in watts under theoretical load conditions. Current CPUs have a TDP of approximately 350 W, with next-generation models reaching 500 W-600 W. While air cooling can still be effective for CPU-based general computing, increased cooling demands may require enhanced airflow management, resulting in larger space requirements and increased fan noise. In contrast, GPUs exhibit much higher TDPs, with the current generation between 900 W-1,000 W and the next generation reaching 1,200 W, making air cooling inadequate due to the dense heat generated. Thus, liquid cooling becomes essential, especially for GPU-based AI applications.

数据中心的液冷技术主要分为两种类型:直接到芯片(DtC)冷却和浸没式冷却,这两种方式的区别在于冷却液与发热电子元件之间的相互作用方式。在直接到芯片的冷却系统中,冷却液并不直接接触电子元件,而是循环到冷板中,冷板取代了传统风冷配置中的散热器。尽管冷板能有效吸收大部分热量,但仍需要风扇来辅助板级散热,不过其风量和风速都显著降低。一些直接到芯片的设计通过服务器机箱内的空气进行热交换,而另一些则在机架或机柜级别集成热交换器,将热量转移到主冷却回路中。相比之下,浸没式冷却则是将 IT 设备直接浸入绝缘液体中,热量直接散发到液体中。这种方法完全不需要风扇,因为液体能高效地吸收并带走热量。然而,与传统的机架式设置相比,浸没式冷却通常需要不同的架构设计,因为它需要专门的密封和支撑结构来容纳浸没在液体中的设备。
Liquid cooling in datacenters is primarily categorized into two types: direct-to-chip (DtC) and immersion cooling, differentiated by the mode of interaction between liquid coolant and heat-generating electronic components. In a DtC setup, the coolant does not directly contact the electronic components; instead, it is circulated to cold plates that replace the traditional heatsinks found in air-cooled configurations. Although the cold plates effectively absorb most of the heat, fans are still necessary to assist with heat removal at the board level, albeit at significantly lower volumes and velocities. Some DtC designs facilitate heat exchange through the server chassis via air, while others incorporate heat exchangers at the rack or row level to transfer heat to a primary cooling loop. In contrast, immersion cooling involves submerging IT equipment directly in a dielectric fluid, allowing for direct heat dissipation into the liquid. This method completely eliminates the need for fans, as the fluid efficiently absorbs and removes heat. However, immersion cooling typically necessitates a different architectural design compared to conventional rack setups, as it requires specialized containment and support structures to accommodate the submerged equipment.

在采用液冷技术的部署中,机架密度预计会超过50kW
Rack density is expected to be above 50 kW in liquid cooling deployments

尽管热设计功耗(TDP)的增加提高了服务器的密度,但运营商可以通过每机架部署更少的服务器来保持较低的机架密度。通常,一个机架中只安装一到两台4U或8U服务器,其余空间用盲板封堵,这样机架密度可保持在10kW至20kW之间,仍可使用空气冷却。然而,人工智能应用需要将大量GPU紧密排列以实现集群效应,因此所有那些盲板都必须让位于GPU服务器,从而提高机架密度,并需要采用液冷技术。
Even as increasing TDP drives up server density, operators could deploy fewer servers per rack to maintain lower rack density. It is not uncommon to install one or two 4U or 8U servers in a rack and seal the other space with blanking panels, maintaining rack density at 10 kW-20 kW and allowing for continued use of air cooling. However, AI applications require that many GPUs are placed close together to achieve a cluster effect; therefore, all those blanking panels must give way to GPU servers, driving rack density higher and requiring liquid cooling.

当被问及当前和预期的液冷机柜功率密度时,大多数受访者提到的功率密度为50kW至200kW(见图 6)。
When asked about current and expected power densities of liquid-cooled racks, most respondents cite densities of 50 kW-200 kW (see Figure 6).

图 6:液冷机柜的典型功率密度
Figure 6: Typical power density of liquid-cooled racks


问:贵组织液冷机柜的典型功率密度是多少?
基础数据:在主要数据中心使用或计划引入液冷技术的受访者(n=126)。

来源: 451 Research 《企业之声:2024年数据中心液冷技术》。
解决更高的机架密度问题仍是液冷技术被提及最多的优点
Addressing higher rack density remains the top cited benefit of liquid cooling

与风冷相比,液冷具有更多优势,比如运行安静以及能效比(PUE)更优。然而,液冷最根本的驱动因素在于其能够支持更高的功率密度,不仅在芯片和服务器层面,在机架层面也是如此。超过半数(54%)的受访者认为,液冷的关键优势在于能够适应更高的机架密度——这与 GPU 功率的提升和集群化有关。其他优势包括:在不出现过热的情况下提升服务器功率(45%)、改善能效比(39%)、优化总体拥有成本(TCO)(38%)、运行更安静(33%)以及支持更高的芯片热设计功耗(TDP)(27%)。
Compared to air cooling, liquid cooling brings more benefits, such as quiet operation and improved PUE. However, the most fundamental driving factor for liquid cooling is its ability to support increased power density, not only at the chip and server level, but at the rack level. More than half (54%) of survey respondents cite accommodating higher rack density — associated with increasing GPU power and clustering — as a key benefit of liquid cooling. Other benefits include increased server power without overheating (45%), improved PUE (39%), optimized TCO (38%), quieter operation (33%) and supporting increased chip TDP (27%).

相比之下,中国数据中心运营商将PUE(电源使用效率)的提升视为采用液冷技术的首要原因。这主要是由中国市场的监管政策所驱动。包括北京、上海、广州、深圳和张家口在内的许多一级数据中心市场都对数据中心的PUE设定了要求。例如,上海规定,位于指定国家数据中心集群内的所有大型和超大规模数据中心的PUE不得超过1.25。未达标的项目将不予批准(对于新项目而言),或者将面临惩罚性电费(对于现有数据中心而言)。液冷技术能够实现比风冷更低的PUE,从而成为一种解决方案。
In comparison, datacenter operators in China cite improved PUE as the top reason for adopting liquid cooling. This is driven by regulations in the China market. Many tier 1 datacenter markets, including Beijing, Shanghai, Guangzhou, Shenzhen and Zhangjiakou, have set PUE requirements for datacenters. For example, Shanghai requires all large and hyperscale datacenters in the designated national datacenter clusters to have a PUE of no more than 1.25. Those that do not meet the requirement either will not be approved (in the case of new projects) or will be subject to punitive electricity charges (in the case of existing datacenters). Liquid cooling can be a solution to achieve lower PUE than air cooling can deliver.

图 7:采用液冷技术的优势
Figure 7: Benefits of adopting liquid cooling


问:如果您的组织要在其主要数据中心部署液冷技术,您认为它会期望实现以下哪些(如果有的话)好处?
基础数据:所在组织未在其主要数据中心使用且无计划引入液冷技术的受访者。

注意:对于CDCC数据,样本量小于50的基线规模应作参考性解读。

来源:451 Research 《企业之声:2024年数据中心液冷技术》以及CDCC的 《数据中心行业调查:2024年数据中心液冷》

影响
Implications

以云服务为主导的混合IT基础设施将继续发展。尽管仍有部分受访者拥有并运营自己的数据中心或在托管站点租用空间,但近四分之三的公司使用云服务,这得益于其广泛的可用性和灵活的特性。许多公司采用多种IT基础设施类型来支持不同的应用程序。
Hybrid IT infrastructure, led by cloud dominance, is set to continue. While a proportion of respondents continue to own and operate datacenters or rent space at colocation sites, close to three-quarters of companies use cloud services, thanks to their wide availability and flexible nature. Many companies use a mix of IT infrastructure types to support different applications.

超大规模数据中心将继续采用简化的不间断电源(UPS)冗余架构,同时保持可靠性。数据中心的UPS冗余架构对于维持关键系统的持续供电和减少停机时间至关重要,从而提高可靠性。随着虚拟机和云可用区的软件级冗余技术日益成熟,运营商在确保系统级冗余的同时优化容量利用率,以降低总体拥有成本。超大规模数据中心尤其推动了DR系统和IR系统的普及。
Hyperscalers will continue to pursue simplified UPS redundancy architecture while maintaining reliability. Datacenter UPS redundancy architecture is crucial for maintaining continuous power and minimizing downtime for critical systems, thereby enhancing reliability. As software-level redundancy matures with virtual machines and cloud availability zones, operators optimize capacity utilization while ensuring system-level redundancy to lower total ownership costs. Hyperscalers are particularly driving the popularity of DR and catcher systems.

加速计算工作负载越来越受欢迎,从而提高了机架密度;在液冷部署中,受访者预计机架功率密度将超过50kW。人工智能应用需要将多个GPU紧密放置以形成集群。因此,通常使用的盲板必须用更多的GPU服务器来替代,从而导致机架密度增加,需要采用液冷技术。调查受访者预计,大多数采用液冷技术处理人工智能工作负载的机架功率密度将超过50kW,在很多情况下甚至会超过100kW。
Accelerated computing workloads are gaining popularity and driving up rack density; in liquid cooling deployments, respondents expect densities of over 50 kW. AI applications necessitate the close placement of multiple GPUs to create clusters. As a result, the common use of blanking panels must be replaced with additional GPU servers, leading to increased rack density that necessitates liquid cooling. Survey respondents expect that most racks handling AI workloads with liquid cooling will exceed a rack density of 50 kW, and in many cases even 100 kW.

当液冷成为解决不断增加的机架密度的必要条件时,它有望得到真正的普及。液冷技术因其可持续性优势而受到推崇,比如能提高效率和降低噪音。然而,这些优势通常被视为“锦上添花”的好处。机架密度很可能需要达到风冷技术的极限,液冷技术才会被整个行业广泛认可为一种必需品。
Liquid cooling is expected to take off when it becomes a necessity to address increasing rack density. Liquid cooling has been promoted for its sustainability benefits, such as improved efficiency and quiet operation. However, these are often viewed as “nice to have” benefits. Rack density will likely need to push the limits of air cooling before the industry broadly recognizes liquid cooling as a necessity.

直播预告

DeepSeek的崛起,是否正在改变智算中心快速扩张的格局?DeepSeek技术突破为何被称为“游戏规则改变者”?智算中心是否会迎来降温?大模型训练与推理方式将如何演变?企业投资者该如何应对这一变革?


2月13日,CDCC直播间邀请业界专家,就DeepSeek会减少智算中心需求吗?这一核心展开讨论。


“第三届数据中心液冷大会”将于2025年3月27日在杭州召开 ,这场行业盛事将集结数据中心领域的顶尖专家、技术大牛和学术研究者,共同探讨创新趋势,携手谋划合作机遇,助力行业蓬勃发展。


关键词: 液冷、全栈、最佳实践、芯片、服务器、基础设施

嘉宾规模:500+专业人士

会议形式: 展览展示+主旨演讲+技术分享+应用案例参观


详情扫码咨询:

联系电话:13716595411

关注我们获取更多精彩内容


往期推荐

● DeepSeek会导致对智算中心的需求大幅减少吗?

● DeepSeek让多地智算中心停建!

● 快手先进冷板液冷解决方案的研究








请到「今天看啥」查看全文