Doctoral Thesis: Advancing Equity & Reliability in Machine Learning
Hewlett Room, G-882
By: Divya Shanmugam
Supervisor: John Guttag
Details
- Date: Thursday, April 18
- Time: 2:00 pm - 3:30 pm
- Category: Thesis Defense
- Location: Hewlett Room, G-882
Additional Location Details:
In this talk, we aim to characterize and mitigate the impact of imperfect data on machine learning models. We address three ways in which data can be flawed: imperfect labels, coarse demographics, and limited evaluation datasets. First, we develop a method to correct for imperfect labels in the form of underdiagnosis between demographic cohorts. We then show how coarse race data obscures disparities across more granular race groups, suggesting existing algorithmic audits may significantly underestimate racial disparities in performance. Finally, we present a method to select between multiple machine learning models in the absence of abundant labeled data.
In sum, we discuss work that represents a step towards a machine learning methodology that is robust to systematic errors in data collection across domains.