LIDS & Stats Tea Talk || Sung Min (Sam) Park (CSAIL)

Wednesday, February 2
4:00 pm - 5:00 pm

This is a friendly reminder that the first LIDS & Stats Tea Talk of the Spring 2022 semester is happening on February 2nd, at 4:00 pm in the LIDS Lounge. Please find more information below.

Note: In accordance with MIT’s COVID policies, we are required to collect contact information from those who attend this event. If you are an MIT Covid Pass holder, please bring your smart phone and be prepared to present your MIT ID Barcode, which can be found in the MIT Atlas app.


Speaker: Sung Min (Sam) Park

MIT Affiliation: CSAIL

Talk Title: Datamodels: Understanding Model Predictions as functions of Data

Date: Wednesday, February 2, 2022

Time: 4:00 PM

Location: LIDS Lounge

Host: Jerrod Wigmore

Abstract: Current supervised machine learning models rely on an abundance of training data. Yet, understanding the underlying structure and biases of this data—and how they impact models—remains challenging. We present a new conceptual framework, datamodels, for directly modeling predictions as functions of training data. We instantiate our framework with simple parametric models (e.g. linear) and apply it to deep neural networks trained on standard vision datasets. Despite the complexity of the underlying process (e.g. SGD on overparameterized neural networks), the resulting datamodels can accurately predict model outputs as linear functions of the presence of different training examples. These datamodels, in turn, give rise to powerful tools for analyzing the ML pipeline: a predictive model for counterfactual impact of removing different training data; a data similarity metric that is faithful to the model class under study; and a rich embedding and graph that gives a principled way to study latent structure in the data.

Bio: Sung Min (Sam) Park is an PhD student in CSAIL, advised by Prof. Aleksander Madry. His research interests include machine learning foundations, with a focus on making deployment of models more robust and reliable. 


ABOUT: Tea talks are 20-minute-long informal chalk-talks for the purpose of sharing ideas and making others aware about some of the topics that may be of interest to the LIDS and Stats audience. If you are interested in presenting in the upcoming tea talks, please email

Information on future talks:


  • Date: Wednesday, February 2
  • Time: 4:00 pm - 5:00 pm
  • Category: