Dual ToR Redundancy

Overview

Dual ToR Redundancy is Microsoft's server-level network redundancy approach for Azure data centers, providing continuous connectivity when a Top-of-Rack switch fails. The architecture connects each server to two ToR switches simultaneously, maintaining seamless application transparency (single IP, single MAC, single uplink view). Microsoft developed this approach specifically because alternatives like MC-LAG and EVPN multi-homing were rejected due to split-brain synchronization concerns and overlay complexity at 10,000+ switch scale.

The technology has evolved through two generations: active-standby (using smart Y-cables with MUX chips, I2C control) and active-active (using FPGA+SoC, gRPC control, stream-level ECMP). The active-active design doubles effective server-to-ToR bandwidth by dispatching traffic per 5-tuple stream across both uplinks. Both generations are open-sourced in SONiC, with four HLD requirement specifications and complete test suites published to the community. This technology is complementary to DPU/SmartNIC Integration, which extends SONiC switch capabilities with pooled DPU compute for SDN policy offloading.

Sign in to read the full article.

Sign In