Doctoral Thesis: Full-Stack Algorithm-System Co-Design for Efficient Visual Generation
MIT 56-114, Zoom: https://mit.zoom.us/j/92268400684
By: Muyang Li
Details
- Date: Thursday, November 6
- Time: 2:00 pm - 4:00 pm
- Category: Thesis Defense
- Location: MIT 56-114, Zoom: https://mit.zoom.us/j/92268400684
Additional Location Details:
Abstract:
The rapid progress of AI-generated content (AIGC) has placed generative models at the core of modern image and video synthesis. Despite enabling powerful applications in creation, editing, and design, their heavy computational cost remains a barrier to real-time and interactive deployment.
This thesis addresses this challenge through a full-stack co-design of algorithms and systems for efficient and scalable visual generation. We begin by developing a general-purpose compression framework, GAN Compression, for generative adversarial networks (GANs), demonstrating that pruning and distillation can significantly reduce computational cost while preserving output fidelity. For diffusion models, we propose SVDQuant, an accurate 4-bit quantization paradigm, together with Nunchaku, a specialized inference engine optimized for low-bit computation. For interactive editing applications, we propose SIGE, a spatially sparse inference technique that reuses the cached activations in unedited regions to avoid redundant computation. Extending to video generation, we develop Radial Attention, an O(nlogn) sparse attention mechanism that alleviates the latency bottleneck of spatiotemporal attention. Finally, we present DistriFusion, a distributed inference framework that parallelizes diffusion computations across multiple GPUs to further reduce latency.
By systematically integrating quantization, sparsity, and distributed inference, this thesis delivers a comprehensive solution to the efficiency bottlenecks of modern generative models, enabling practical, real-time, and large-scale deployment of visual generative models.
Host
- Muyang Li
- Email: Muyangli@mit.edu