Subscribe
About
  • Home
  • /
  • Storage
  • /
  • Powering and cooling AI and accelerated computing

Powering and cooling AI and accelerated computing

By Wojtek Piorko, managing director for Africa at Vertiv

Johannesburg, 12 Aug 2024
Wojtek Piorko, managing director for Africa, Vertiv.
Wojtek Piorko, managing director for Africa, Vertiv.

There’s no doubt, across any industry, that artificial intelligence (AI) is here, and it is here to stay. The use cases for AI are virtually limitless, from breakthroughs in medicine and enhanced farming techniques to high-accuracy fraud prevention and personalised education.

It is heartening to see that there is opportunity for great development within Africa. In fact, a paper published in late 2023 by Access Partnership stated that AI is already being used with significant effect in Africa, to help address challenges such as predicting natural disasters, like floods and earthquakes, as well as in the protection of endangered species on the continent, improving food security and improving maternal health outcomes.

The paper notes that a preliminary assessment by Access Partnership estimates that AI applications could support up to USD136 billion worth of economic benefits for just four sub-Saharan countries (Ghana, Kenya, Nigeria and South Africa) by 2030, based on current growth rates and scope of analysis. "To put this in perspective, this figure is higher than Kenya’s current GDP and represents 12.7% of the 2022 GDP for these four economies," it says.

Making the move to high-density

AI is already transforming people’s everyday lives, with local use of technology like ChatGPT, virtual assistants, navigation apps and chatbots on the upswing. And, just as it is transforming every single industry, it is also beginning to fundamentally change data centre infrastructure, driving significant changes in how high-performance computing (HPC) is powered and cooled.

To put this into perspective, consider the fact that a typical IT rack used to run workloads from five to 10 kilowatts (kW), and racks running loads higher than 20kW were considered as high-density. AI-chips, however, can require around five times as much power and five times as much cooling capacity[1] in the same space as a traditional server. So, we’re now seeing rack densities of 40kW per rack and even more than 100kW in some instances.

This will require extensive capacity increases across the entire power train; from the grid to chips in each rack. It also means that, due to traditional cooling methods not being able to handle the heat generated by GPUs running AI calculations, the introduction of liquid-cooling technologies into the data centre white space, and eventually the enterprise server room, will be a requirement for most deployments.

Investments to upgrade the infrastructure needed to both power and cool AI hardware are substantial, and navigating these new design challenges is critical. The transition will not happen quickly: data centre and server room designers must look for ways to make power and cooling infrastructure future-ready, with considerations for the future growth of their workloads.

Getting enough power to each rack requires upgrades from the grid to the rack. In the white space specifically, this likely means high amperage busway and high-density rack PDUs. To reject the massive amount of heat generated by hardware running AI workloads, two liquid cooling technologies are emerging as primary options:

  1. Direct-to-chip liquid cooling: Cold plates sit atop the heat-generating components (usually chips such as CPUs and GPUs) to draw off heat. Pumped single-phase or two-phase fluid draws off heat from the cold plates to send it out of the data centre, exchanging heat but not fluids with the chip. This can remove between 70% to 75% of the heat generated by equipment in the rack, leaving 25% to 30% to be removed by air-cooling systems.
  2. Rear-door heat exchangers: Passive or active heat exchangers replace the rear door of the IT rack with heat exchanging coils, through which fluid absorbs heat produced in the rack. These systems are often combined with other cooling systems as either a strategy to maintain room neutrality or as part of a transitional design starting the journey into liquid cooling.

While direct-to-chip liquid cooling offers significantly higher density cooling capacity than air, it is important to note that there is still excess heat that the cold plates cannot capture. This heat will be rejected into the data room unless it is contained and removed through other means such as rear-door heat exchangers or room air cooling.

Supporting the higher power and cooling requirements of AI

Because power and cooling are becoming such integral parts of IT solution design in the data room, we’re seeing a blurring of the borders between IT and facilities teams, something that can add complexity when it comes to design, deployment and operation. Thus, partnerships and full-solution expertise rank as top requirements for smooth transitions to higher densities.

To simplify this shift, Vertiv recently introduced the new Vertiv 360AI portfolio to EMEA and boost customers’ AI plans.

These solutions provide a streamlined approach for scalable AI infrastructure, addressing the evolving challenges posed by high-performance computing. Vertiv 360AI is designed to help accelerate retrofits of air-cooled edge and enterprise data centres, as well as the development of hyperscale greenfield projects.

Vertiv 360AI also features prefabricated modular solutions to enable customers to deploy AI without disturbing existing workloads and without consuming floorspace. Initial Vertiv 360AI solutions can power and cool over 130kW per rack, and include designs optimised for retrofits.

More information on the Vertiv 360AI offering is available here. Alternatively, please visit Vertiv’s AI Hub for access to expert information, reference designs and resources to successfully plan your AI-ready infrastructure.

[1] Management estimates: Comparison of Power Consumption & Heat Output at a rack level for 5 Nvidia DGX H100 Servers & 21 Dell PowerStore 500T & 9200T Servers in a standard 42U rack based on Manufacturer Spec Sheets

Share

Editorial contacts