Why your data center needs 135% cooling capacity

Munters’ Craig MacFadyen on the hybrid cooling math that’s becoming standard—and why designing for flexibility matters more than designing for day one.

Design a hybrid cooling system for a modern data center and you’ll hit a number that looks like a mistake: 135% total cooling capacity. That’s 95% from liquid cooling plus 40% from air cooling. It adds up to more than 100%, which tends to make engineers twitch.

But in a recent interview (embedded below) with The Data Center Engineer, Craig MacFadyen, Strategy and Portfolio Management at Munters, told us that the math isn’t broken. It’s the point.

It means that the size of the CRAHs and the CDUs are there to capture that, and to be flexible and adaptable.

“What most people are doing is designing for around 95% liquid cooling, and then about up to 40% on the air cooling,” MacFadyen told The Data Center Engineer. “That’s not saying that they’re designing 135% for the heat rejection. That will still be 110% aligned for redundancy. But it means that the size of the CRAHs and the CDUs are there to capture that, and to be flexible and adaptable.”

The overlap isn’t waste. It’s insurance against an industry that can’t predict what servers will show up three years from now.

The flexibility tax

Data center cooling design has always been governed by resilience, reliability, and efficiency. What’s changed is the uncertainty underneath them. Rack densities are climbing, AI workloads are shifting thermal profiles, and the specific hardware going into a facility might not be decided until years after the cooling infrastructure is designed.

Watch the full interview

That gap between design and deployment is why hybrid systems are becoming the default. Different liquid cooling approaches capture different amounts of heat: some grab 60%, others hit 95%, and full immersion can go higher still. Air cooling handles whatever’s left, plus provides a safety margin. Sizing both systems with overlap means operators can swap cooling strategies as IT loads change without ripping out infrastructure.

Munters, traditionally known for indirect evaporative cooling, has expanded its portfolio to match. The company now offers CRAHs, CDUs (coolant distribution units), and chillers, including its CircMiser line, which uses cylindrical condenser coils 45% larger than a standard V-bank design.

“The compressors don’t need to work so hard, the fans don’t need to work so hard,” MacFadyen said. “So you’ve got a unit that’s far more efficient, a unit that is quieter, and a unit that’s generally a lot smaller as well.”

When 5% becomes the hard part

There’s an irony in high-efficiency liquid cooling. The better it gets, the harder it is to manage what’s left over.

When a liquid cooling system captures 95% or more of the heat, the remaining 5% still needs air movement across the racks. But designing airflow for such a small fraction of the total load creates its own engineering problem.

“If you’ve got something that’s capturing a very high percentage on the liquid cooling, so 95% or a hundred percent, then you might only have 5% for airflow,” MacFadyen explained. “Getting that for air movement across the racks might be more of a challenge when it’s such a low number.”

DCIM becomes more than a nice-to-have at these densities. As rack power climbs, real-time thermal monitoring isn’t optional. MacFadyen sees this becoming a bigger priority over the next few years, particularly as operators start collecting performance data from the first wave of large-scale liquid cooling deployments.

The failure mode nobody designs around

Ask a cooling engineer about failure modes and you’ll hear about pump redundancy, valve failures, and leak detection. The most common issue Munters actually sees on site is more mundane: Building Management System (BMS) communications going down.

The most common failure we’ve seen is probably BMS comms tends to go down.

“The most common failure we’ve seen is probably BMS comms tends to go down,” MacFadyen said. “What generally will happen then is we’ve got our own controls in every product, and the controls will keep controlling to the last set point it’s been given by the BMS until the comms are resumed.”

Controls backed by UPS, running on the last known parameters. It’s a simple failsafe for a failure that happens more often than mechanical breakdowns.

Keep it simple, keep one pressurization point

MacFadyen’s design advice comes down to a single principle: reduce complexity wherever you can.

“Always keep things simple as possible,” he said. “Complexity has risks, so don’t have too many points of failure.”

He pointed to CDU architecture as a case study. When CDUs were deployed one per rack, each unit had its own pressurization, pump redundancy, and filtration. That made sense in isolation. But when CDUs become part of a larger centralized system, multiple pressurization points create conflicts. The fix is to consolidate to a single pressurization point per chilled water circuit, which is what good practice already calls for.

Liquid cooling at scale is still a baby

For all the industry excitement around liquid cooling, MacFadyen offered a blunt assessment of where things actually stand.

We’re only deploying the first liquid cooling deployments now… We haven’t time to learn anything about the systems that are employed.

“Liquid cooling at scale is in its infancy. It’s been going on for 60 odd years, but at scale it’s really still a baby,” he said. “We’re only deploying the first liquid cooling deployments now. We haven’t had time for feedback. We haven’t time to learn anything about the systems that are employed.”

The industry doesn’t fully understand how load profiles will behave inside these servers yet. MacFadyen estimates two to five years of real-world operation before the data exists to guide meaningful design changes. Two to five years of learning what works, what breaks, and what nobody anticipated.

Warmer chips, fewer compressors

One development that could reshape cooling economics: chips are getting more heat-tolerant.

MacFadyen said Munters has heard from Nvidia that future chip architectures will accept higher liquid temperatures, referencing the Rubin architecture specifically. Higher allowable coolant temperatures mean more hours of free cooling per year, which means less compressor runtime and lower energy bills.

“If it can handle warmer temperatures, we can do a lot more free cooling, which means you’re using less energy, less compressors,” MacFadyen said. “The balance between chillers and dry coolers will change. We’ll do predominant cooling with the dry coolers, but the chillers will still need to be there.”

Chillers won’t disappear. Global warming guarantees they’ll be needed on the hottest days. And there’s a constraint that has nothing to do with silicon: people still need to work inside these facilities.

“We can’t have the data centers in themselves being up at 45, 50 degrees,” MacFadyen said. “‘Cause people can’t go in there and work on the servers and work on the networking.”

The cooling stack is getting more efficient, and the balance is shifting toward dry coolers and free cooling. But the human body still has a thermal envelope, and no chip architecture is going to change that.