Florentin Guth, “Towards a science of deep learning: the structure of data and weights”
34-401A Grier A
Abstract:
With deep learning, we now have access to strikingly powerful generative models of images and text, in apparent spite of the curse of dimensionality. This success rests on two facts: natural data is highly structured, and our choices of network architectures and training algorithms implicitly encode strong assumptions about this structure, so that only a small number of “effective” parameters ultimately need to be learned. However, we lack a principled understanding of the mechanisms behind this process, leaving us with limited guidance for improving data efficiency or mitigating failures such as memorization and hallucination. In this talk, I will present experiments on deep networks to uncover the structure of their training data and its encoding in the learned weights. I will show that with sufficient data, diffusion models learn a genuine probability distribution over natural images rather than a collection of memorized samples. I will introduce a novel energy-based approach that makes this learned distribution explicit, revealing that the usual picture of images concentrating near a low-dimensional “typical manifold” needs to be revised. Local low-dimensionality alone is therefore insufficient to lift the curse of dimensionality. To capture the more global structure that deep networks exploit, I will present a framework for extracting “effective parameters” from trained weights, quantifying how many such parameters are learned, and comparing them across networks. These results provide insights into what and how neural networks learn, paving the way towards the principled design of architectures and algorithms whose inductive biases explicitly match available prior information about the data.
Bio:
Florentin Guth is a Faculty Fellow in the Center for Data Science at NYU and a Research Fellow in the Center for Computational Neuroscience at the Flatiron Institute. He earned his PhD at École Normale Supérieure (Paris). His work received an Outstanding Paper Award at ICLR 2024, and he is a co-lead organizer of the Sci4DL workshop series on “Scientific Methods for Understanding Deep Learning”. His research interests include explaining why neural networks generalize, what are their inductive biases, and what properties of natural data underlie their success.
Details
- Date: Thursday, March 12
- Time: 11:00 am - 12:00 pm
- Category: Special Seminar
- Location: 34-401A Grier A
Host
- Greg Wornell
- Email: chadcoll@mit.edu