Will Liquid Cooling Dominate Thermal Management for Data Centers?
2023519 Yulin Wang
Over the past 16 years, the thermal design power (TDP) of GPUs has quadrupled. With the increasing demand for AI, cloud computing, and crypto mining, IDTechEx expects the power consumption of server boards and data centers to continue rising. With life starting to return to normal after the end of the Covid pandemic, IDTechEx has observed significant expansion in the data center industry. For example, AMD's 2022 Q4 financial statement indicates a 42% year-over-year increase in revenue from its data center segment, indicating rapid market growth. As the data center industry prospers, the data center thermal management field is also expected to experience significant growth. IDTechEx forecasts that by 2033, the global annual revenue for data center liquid cooling hardware will exceed US$900 million, presenting a substantial opportunity for businesses.
IDTechEx has recently published a new report titled "Thermal Management for Data Centers 2023-2033", which covers the adoption of liquid cooling technologies, including direct-to-chip cooling, immersion cooling, single-phase and two-phase, coolant, regulations, coolant distribution units (CDUs), and many other key technologies.
Data center thermal management can be broadly categorized into two types based on the cooling medium: liquid cooling and air cooling. While liquid cooling has gained popularity in recent years, air cooling remains traditional and the most widely used approach, offering several advantages:
- Ease of use: Air cooling solutions are relatively simple to install and operate. They typically involve the use of fans or heatsinks to dissipate heat from components, making them easy to access and user-friendly. The familiarity and simplicity of air-cooling systems make them convenient for data center operators.
- Established success: Air cooling has a long-established track record of successful thermal management in data centers. Many data center end-users have invested significant resources in building and optimizing air-cooled infrastructures.
- Liquid-free operation: Unlike solutions that rely on coolants and liquid circulation, air cooling eliminates the need for liquid-related components and infrastructure. This removes risks such as leaks, pump failures, or coolant evaporation. The absence of liquid in air cooling systems simplifies maintenance and reduces the chances of malfunction or operational disruptions.
However, despite the benefits of air cooling, its low specific heat capacity limits its effectiveness in meeting the growing cooling demands of modern data centers. To address this challenge, liquid cooling has emerged as a viable solution. Liquid cooling harnesses the higher specific heat capacity of liquids, making them more efficient in dissipating heat. There are two common types of liquid cooling methods: direct-to-chip (cold plate) cooling and immersion cooling.
Cold plate cooling involves mounting a cold plate directly on top of heat sources such as CPUs and GPUs with a layer of thermal interface material (TIM) in between. The coolant inside the cold plate chamber absorbs and transfers heat away from the components. On the other hand, immersion cooling submerges the heat sources into a coolant, allowing for direct contact and efficient heat dissipation.
The collaboration between server suppliers and cold plate manufacturers has accelerated the adoption of cold plate cooling. Integrated solutions (servers with cold plates installed) are being offered directly to end users. While direct-to-chip cooling has demonstrated great performance, the limited expertise of end-users in integrating cold plates onto their off-the-shelf servers has been a factor limiting adoption. An example of such collaboration includes CoolIT Systems partnering with Intel to develop direct-to-chip cooling solutions specifically tailored for Intel Xeon Scalable CPUs. By leveraging collaborations and integrated solutions, end-users benefit from cold plate cooling, including improved efficiency and lower partial power use effectiveness (pPUE), without integration complexities.
Another emerging liquid cooling technology is immersion cooling, which offers excellent heat dissipation performance with pPUEs as low as 1.01 demonstrated. However, there are several concerns that have limited the widespread adoption of immersion cooling:
- Complexity: Immersion cooling requires significant modifications to existing server boards. As servers are directly immersed in the liquid coolant, factors such as material compatibility between the servers and coolant fluids need to be considered, adding complexity and additional costs to the implementation process.
- Lack of expertise: Immersion cooling is still in its early stages, and the market lacks sufficient expertise and experience in implementing and managing this technology.
- High upfront costs and maintenance: Retrofitting existing air-cooled data centers to accommodate immersion cooling can be expensive. Immersion cooling also has the highest initial capital expenditure (CAPEX) in terms of cost per watt. Additionally, ongoing maintenance and operational costs may also be higher compared to other cooling methods. However, due to the efficient heat dissipation, the energy savings in the long term makes immersion cooling cost-effective for data center users.
- Limited demand: While the power requirements of data centers have been increasing, IDTechEx believes that a combination of air cooling and direct-to-chip cooling can adequately meet cooling demands in the short to mid-term for the major applications. The urgent need for immersion cooling is not currently prevalent in the market.
Partial Power Use Effectiveness (pPUE) for Data Center Cooling Approaches. Source: IDTechEx
In conclusion, the demand for higher cooling capacity is driving the fast growth of liquid cooling, particularly in the form of direct-to-chip/cold plate cooling. This growth presents numerous opportunities for server manufacturers, data center operators, coolant fluid suppliers, and coolant distribution unit (CDU)/pump suppliers. On the other hand, immersion cooling is expected to initially be adopted by major players like Microsoft and Meta. However, widespread adoption may take time due to factors such as high costs, limited expertise, and maintenance requirements. Collaboration among companies in the data center immersion cooling supply chain will be crucial for its broader implementation. Nonetheless, immersion cooling offers significant opportunities for various companies, such as coolant suppliers. For more detailed information, please refer to IDTechEx's latest report on "Thermal Management for Data Centers 2023-2033".
Upcoming Free-to-Attend Webinar
Navigating the Liquid Cooling Dominance in Data Centers - An IDTechEx Roadmap
Yulin Wang, author of this article and Technology Analyst at IDTechEx, will be presenting a webinar on the topic on Thursday 20 July 2023 - Navigating the Liquid Cooling Dominance in Data Centers - An IDTechEx Roadmap.
In this webinar, several topics related to data center cooling will be discussed:
- Air cooling: An overview of the traditional air-cooling approach and its benefits and limitations
- Direct-to-chip/Cold Plate Cooling: An exploration of cooling methods that involve direct contact between the cooling medium and the chips or the use of cold plates
- Immersion cooling: An examination of cooling techniques that utilize immersion of servers or components in dielectric fluids
- Coolant comparison and regulations: A comparison of different coolants used in liquid cooling systems, along with relevant regulations
- Single-phase and two-phase cooling: An analysis of single-phase and two-phase cooling methods and their applications in data centers
- Thermal interface materials (TIMs): An overview of the importance of thermal interface materials in efficient heat transfer within cooling systems
Click here to find out more and register your place on one of our three sessions.