Blog

The Rise of Liquid Cooling in Data Centres: A Comprehensive Guide

Introduction

As the demand for faster, more efficient, and sustainable data centres grows, traditional air-cooling methods struggle to keep up with the increasing heat densities of modern servers. Enter liquid cooling—a technology that is rapidly transforming the landscape of data centre cooling. In this blog post, we will delve into what liquid cooling is, its benefits, challenges, and why it is becoming a game-changer in the industry.

What is Liquid Cooling?

Liquid cooling follows the same general thermal principles of heat exchange that traditional air cooling does but involves using a liquid medium to remove heat from components such as CPUs, GPUs, and other high-performance computing hardware. The use of liquid cooling is not a new innovation as it was, and still is, a popular cooling option for high performance CPU enthusiasts and overclockers such as gamers since the early 2000s. Unlike air cooling, which relies on fans and heat sinks, liquid cooling can more efficiently transfer heat away from critical components, ensuring optimal performance and longevity. Key to understanding why this is important is in noting the rise in high-power density hardware which accompanies the AI boom.

Benefits of Liquid Cooling

  1. Enhanced Efficiency

Liquid cooling systems can remove heat more efficiently than air cooling, allowing data centres to manage higher heat densities and support more powerful computing hardware. The thermal conductivity of liquids is significantly higher than that of air, enabling faster and more effective heat dissipation. This is crucial for maintaining the performance and longevity of AI hardware under heavy workloads AS AI hardware, particularly GPUs and specialized AI processors, generates a substantial amount of heat due to their high-power consumption

2. Energy Savings 

By reducing the need for high-speed fans and extensive air conditioning systems, liquid cooling can significantly lower energy consumption. This leads to reduced operational costs and a smaller carbon footprint especially when dealing with AI hardware which requires intensive cooling solutions due to its high power draw. Some estimates suggest that liquid cooling can reduce energy usage by up to 50% (Asetek Technology, 2024) compared to traditional air-cooling methods. 

3. Space Optimization 

Liquid cooling systems often require less physical space compared to traditional air-cooling setups, allowing for more compact data centre designs. This can be particularly beneficial in urban environments where space is at a premium. The removal of bulky air handling units can free up valuable floor space for additional server racks. Additionally, liquid cooling allows data centres to pack more computing power into a smaller physical footprint by efficiently managing the heat produced by densely packed AI hardware. This leads to higher computational density, which is essential for AI workloads that require substantial parallel processing power. 

4. Improved Performance 

With better thermal management, servers can operate at higher performance levels without the risk of overheating. This is particularly important for high-performance computing (HPC) applications and artificial intelligence workloads, which generate substantial amounts of heat. Additionally, AI workloads are overly sensitive to temperature variations. Liquid cooling ensures stable and optimal operating temperatures, preventing thermal throttling, which can degrade performance. This stability is crucial for AI applications that require consistent and high computational throughput.

5. Environmental Benefits 

The improved energy efficiency of liquid cooling systems can contribute to lower carbon footprints for data centres, aligning with sustainability goals. Reduced reliance on traditional HVAC systems also minimizes the emission of greenhouse gases associated with energy production.

Types of Liquid Cooling Systems 

  1. Direct-to-Chip (D2C) Cooling 

Direct-to-Chip (D2C) cooling is a method where a liquid coolant is applied directly to the surface of heat-generating components, such as CPUs and GPUs. The coolant absorbs the heat and is then transported to a heat exchanger, where it releases the heat before being recirculated. D2C cooling systems typically use cold plates, which are attached to the components, ensuring efficient thermal transfer.

Advantages: 

  • Precise cooling of individual components 
  • High thermal efficiency 
  • Scalability for high-density server environments 

Disadvantages: 

  • Requires specialized hardware and infrastructure 
  • Potential risk of leaks 

2. Immersion Cooling 

Immersion cooling involves submerging entire servers or server components in a dielectric (non-conductive) liquid. The liquid directly absorbs the heat generated by the components, which is then carried away through a circulation system. Immersion cooling can be categorized into single-phase and two-phase systems. In single-phase systems, the coolant remains in liquid form, while in two-phase systems, the coolant undergoes a phase change (from liquid to gas) to enhance heat absorption. 

Advantages: 

  • Uniform cooling of all components 
  • Eliminates the need for fans and air-cooling infrastructure 
  • Can manage extremely high heat loads 

Disadvantages: 

  • Higher initial setup costs 
  • Complexity in maintenance and operation 

3. Rear-Door Heat Exchangers 

Rear-door heat exchangers (RDHx) are mounted on the back of server racks and use chilled water to cool the air exiting the racks. This method enhances the efficiency of traditional air-cooling systems by removing heat before it enters the data centre’s ambient air. RDHx systems can be used in conjunction with air cooling to provide a hybrid solution. 

Advantages: 

  • Easy integration with existing air-cooled systems 
  • Reduced hot aisle containment requirements 
  • Lower operational costs compared to complete liquid immersion 

Disadvantages: 

  • Less efficient than direct-to-chip and immersion cooling 
  • Limited cooling capacity for extremely high-density racks 

Types of Liquid Cooling Systems 

  1. Initial Costs 

Implementing liquid cooling systems can involve significant upfront investment in terms of equipment and infrastructure modifications. This includes the cost of specialized cooling hardware, plumbing, and potential retrofitting of existing data centres. 

2. Maintenance 

Liquid cooling systems require specialized maintenance to prevent leaks and ensure the coolant remains effective over time. This includes regular inspections, coolant replacement, and monitoring for potential contamination. 

3. Compatibility 

Not all data centre equipment is designed to be compatible with liquid cooling, potentially limiting the choice of hardware. Manufacturers are increasingly offering liquid-cooled variants of their products, but widespread adoption may still require custom solutions. 

4. Skill Requirements 

Managing and maintaining liquid cooling systems requires specialized skills, which may necessitate additional training for data centre staff. This includes knowledge of fluid dynamics, thermodynamics, and the specific requirements of the cooling technology in use. 

Case Studies and Industry Adoption 

Several leading tech companies and data centre operators are already reaping the benefits of liquid cooling. For example: 

  • Google has implemented liquid cooling in its data centres to manage the heat generated by its AI and machine learning hardware. The company has reported significant improvements in energy efficiency and cooling performance. 
  • Microsoft is exploring immersion cooling as part of its Project Natick, which involves underwater data centres to improve energy efficiency and cooling effectiveness. The project aims to leverage the natural cooling properties of seawater to reduce the need for traditional cooling infrastructure. 
  • Facebook has adopted direct-to-chip cooling in some of its data centres to support high-performance computing needs. The company has highlighted the benefits of improved thermal management and energy savings. 
  • NVIDIA’s DGX AI systems, particularly the DGX A100, leverage advanced liquid cooling technology to manage the immense heat generated by their powerful GPUs. Each DGX A100 system integrates eight NVIDIA A100 Tensor Core GPUs, designed for high-performance AI workloads, which collectively produce substantial thermal output. By using direct-to-chip liquid cooling, NVIDIA ensures efficient heat removal directly from the GPUs. This method enhances the system’s performance, reliability, and energy efficiency, allowing it to maintain optimal operating temperatures even under maximum load. The liquid cooling solution not only reduces the need for energy-intensive air cooling systems but also enables higher computational density, making NVIDIA’s DGX systems an ideal choice for data centres focused on AI and machine learning applications​ (Asetek)​. 

Future Trends 

As technology continues to advance, we can expect to see further innovations in liquid cooling. Some emerging trends include: 

  1. Hybrid Cooling Systems 

Combining air and liquid cooling to maximize efficiency and flexibility. Hybrid systems can provide the best of both worlds, offering precise cooling for high-density areas while maintaining traditional air cooling for less demanding applications. 

2. Advanced Coolants 

Development of new, more efficient, and environmentally friendly coolant formulations. This includes biodegradable coolants and those with enhanced thermal properties, reducing the environmental impact of liquid cooling systems. 

3. AI-Driven Cooling Management 

Leveraging artificial intelligence to optimize cooling system performance and predict maintenance needs. AI can analyse real-time data to adjust cooling parameters, ensuring optimal thermal conditions, and preventing potential failures. 

Conclusion 

Liquid cooling is set to play a pivotal role in the future of data centre design and operation. Its ability to manage higher heat densities, improve energy efficiency, and support sustainable practices makes it an attractive option for data centres aiming to stay ahead of the curve. While there are challenges to overcome, the long-term benefits of liquid cooling are compelling, making it a worthwhile investment for forward-thinking data centre operators. As the industry continues to evolve, liquid cooling will undoubtedly be at the forefront of innovation, driving data centres towards a more efficient and sustainable future. 

For more information on our liquid cooling solutions, visit our website : https://techaccess.co.za/thermal-management/#Precision-Cooling 

Published by Ameen Tayob

Junior Mechanical Engineer