NVIDIA's GB200 NVL72 has achieved exascale performance capabilities within a single rack by implementing advanced job scheduling techniques tailored for its architecture. This innovation enables the execution of real-time trillion-parameter models, boosting the efficiency and speed of AI workloads.
The Power of GB200 NVL72
Central to this achievement is the GB200 NVL72, which features 72 NVIDIA Blackwell GPUs interconnected via the high-speed NVLink fabric. This configuration provides an impressive 130 terabytes per second of low-latency communication bandwidth, delivering exceptional performance for artificial intelligence and high-performance computing tasks. Recent benchmarks show that the GB200 NVL72 can deliver over 2.6 times improvement in training performance compared to its predecessors, proving its ability to handle a range of AI workloads, including real-time inference and reasoning applications.
Importance of Topology-aware Scheduling
To maximize the GB200 NVL72's capabilities, effective job scheduling is key. Traditional scheduling methods often result in resource fragmentation when managing multiple jobs within a shared cluster. The introduction of topology-aware scheduling tackles this problem by optimizing resource allocation according to the physical network layout of the cluster. This method keeps workloads within the same NVLink domain, maximizing the advantages of the available networking bandwidth.
The longstanding Slurm topology/tree plugin provided basic scheduling capabilities, but it frequently led to job fragmentation across network switches. While this was somewhat manageable with legacy InfiniBand systems, it fell short for modern rack-scale architectures like the GB200 NVL72. To resolve this, NVIDIA, in partnership with SchedMD, has introduced a new topology/block plugin in Slurm 23.11, specifically designed for high-performance systems.
New Scheduling Strategies
The topology/block plugin significantly improves job scheduling by offering detailed information about node groupings within the same NVL72 domain. This enhancement allows for better job alignment with domain boundaries, effectively reducing resource fragmentation and increasing overall system efficiency. With this capability, Slurm can accommodate the varying bandwidth demands of multiple concurrent training jobs, making it an essential tool for optimizing GPU occupancy in shared environments.
As AI models become more complex and larger in scale, the integration of advanced scheduling techniques, such as those implemented with the GB200 NVL72, will be critical for efficient resource utilization. The ongoing collaboration between hardware manufacturers like NVIDIA and software developers like SchedMD demonstrates a strong commitment to advancing AI performance, making sure that future computing resources can meet the needs of advanced technology.
This shift toward topology-aware scheduling not only enhances performance across current workloads but also lays the groundwork for more ambitious AI projects ahead. With infrastructure like the GB200 NVL72 and its supporting scheduling tools, the AI community can expect significant advancements in both research and application domains, paving the way for innovative solutions that were once out of reach.



