Doctoral Thesis: Optimizing Networked Systems for Artificial Intelligence Training Workloads

Tuesday, December 3

3:00 pm - 4:00 pm

G882, Stata Center

Add to Calendar

By: Sudarasanan Rajasekaran

Thesis Supervisor: Manya Ghobadi

Details

Date: Tuesday, December 3
Time: 3:00 pm - 4:00 pm
Category: Thesis Defense
Location: G882, Stata Center

Additional Location Details:

ABSTRACT
The ever-growing increase in dataset and model sizes of deep learning has created a massive demand for efficient GPU clusters. Several studies have demonstrated that as the number of GPUs increases, the communication overhead of distributed Machine Learning
(ML) training workloads quickly takes up a significant portion of training iteration time. Yet state-of-the-art ML schedulers tend to ignore the communication pattern of ML training jobs when placing workers on GPUs.
This thesis advocates for communication-aware resource scheduling as a critical approach to optimizing network utilization in ML clusters. The key idea for accelerating DNN jobs is to interleave the communication demands of different jobs sharing a network link. To illustrate this concept of interleaving, we first demonstrate how intentionally creating unfairness in bandwidth share between different DNN jobs improves their iteration times. Building on this insight, we present two novel systems designed to minimize network congestion and accelerate DNN training and fine-tuning.
The first system is Cassini, a network-aware job scheduler for ML clusters. Cassini introduces a novel geometric abstraction to consider the communication pattern of different jobs while placing them on network links. To do so, Cassini uses an Affinity graph that finds a series of time-shift values to adjust the communication phases of a subset of jobs such that the communication patterns of jobs sharing the same network link are interleaved with each other.
Second is MLTCP, a distributed technique to approximate an interleaved centralized flow schedule. At the heart of MLTCP lies a straight-forward principle based on a key conceptual insight: by scaling the congestion window size (or sending rate) based on the number of bytes sent at each iteration, MLTCP flows eventually converge into a schedule that reduces network contention.
To evaluate these systems, we conducted experiments using real-world DNN models on a testbed with Nvidia A100 GPUS. Cassini and MLTCP improve the average iteration times by up to 1.6⇥ and 1.9⇥, respectively, demonstrating their effectiveness in reducing network congestion and accelerating ML workloads.
Thesis supervisor: Manya Ghobadi
Title: Associate Professor of Electrical Engineering and Computer Science

Host

Sudarsanan Rajasekaran