In modern high-performance computing (HPC) and AI data centers, the networking architecture is bifurcated into two specialized fabrics to handle distinct traffic profiles.
The Front-End Network (or Management Fabric) typically utilizes standard Ethernet to connect the cluster to external storage, user interfaces, and the internet; it manages tasks like job scheduling, data ingestion, and general system orchestration.
Conversely, the Back-End Network (or Compute Fabric) is a dedicated, ultra-low-latency, high-bandwidth interconnect, designed specifically for inter-node communication. In AI workloads, this back-end fabric is critical for "scale-out" operations, allowing XPUs to exchange massive parameter gradients during training and inference via collective communications like All-Reduce.
By separating these planes, architects ensure that heavy compute-to-compute synchronization doesn't compete for bandwidth with routine data management, minimizing jitter and preventing communication bottlenecks that would otherwise idle expensive processing cores