Doctoral Thesis: Towards Interpretable and Operationalized Fairness in Machine Learning
32-D463, Star conference room
By: Schrasing Tong
Thesis Supervisor(s): Lalana Kagal
Details
- Date: Thursday, December 11
- Time: 3:30 pm - 5:00 pm
- Location: 32-D463, Star conference room
Additional Location Details:
Abstract:
Machine learning systems are increasingly deployed in sensitive, real-world
settings, yet persistent biases in model predictions continue to disadvantage
marginalized groups. This thesis develops practical and interpretable
methods for understanding and mitigating such biases in natural language
generation and computer vision.
For large language models, we introduce a decoding-time approach that
leverages small biased and anti-biased expert models to obtain a debiasing
signal that is added to the LLM output. This approach combines computational
efficiency – fine-tuning a small model versus re-training a large model
and interpretability – one can examine the probability shift from
debiasing.
In computer vision, we leverage concept bottleneck models (CBMs), which map
images to human-understandable concepts, to improve transparency and help
mask proxy features that correlate with sensitive attributes. To counter CBM
information leakage and improve fairness-performance tradeoffs, we introduce
three mitigation strategies: (1) reducing leakage with a top-k concept
filter, (2) removing concepts that correlate strongly with gender, and (3)
applying adversarial debiasing to further suppress sensitive information.
Together, these contributions illustrate how interpretability and
operationalization can make fairness interventions more trustworthy,
scalable, and aligned with real deployment needs.