AI Networking Bottleneck

Overview

The AI networking bottleneck describes the structural mismatch between GPU compute scaling and switching infrastructure scaling in large AI training clusters. As GPU count scales into the tens and hundreds of thousands, the number of switches required to maintain full bisection bandwidth in conventional fat-tree topologies grows faster than the GPU count, because switch radix (port count) has not kept pace with GPU scaling. The result is that GPUs spend approximately 60% of their time idle waiting for collective communication operations — AllReduce, AllToAll, and AllGather — to complete across an oversubscribed network. This bottleneck is most acute for Mixture-of-Experts (MoE) models, which generate extreme AllToAll traffic patterns across all GPUs in a cluster. More than 25% of cluster power is consumed by networking, making it both a performance and an efficiency problem.

The bottleneck is driving investment in purpose-built high-radix switch ASICs (see Eridu), Co-Packaged Optics for higher port bandwidth density, Multi-Core Fiber for cabling density, and Adaptive Routing For AI to maximize utilization of existing fabric capacity. Dell'Oro projects that scale-up switching (intra-cluster) will represent 60% of AI backend switching infrastructure by 2030.