Rack Scale AI Computing

Overview

Rack-Scale AI Computing represents the paradigm shift from server-level (4-8 GPUs per server) to rack-level (32-128+ GPUs per rack) computing. As AI models outgrow single-server capacity, the rack replaces the server as the fundamental unit of compute, with GPU platforms, power, cooling, storage, and networking designed as an integrated system. This drives changes across the stack: bus bar power distribution, liquid cooling, fabric switches within the rack, and rack-level management.

Key architectural patterns include NVIDIA NVL72 (72 GPUs/rack at 132 kW), Meta's Clemente/Catalina platforms, and reference designs from ODMs integrating GPU trays, scale-up switches, power shelves, and cooling into unified rack assemblies. The scale of investment is striking: Meta evolved from 6K GPU clusters (2022) to 100K+ GPU clusters (2024-2025), and HVDC power architecture has become a prerequisite as rack densities exceed 100 kW.