Doctoral Thesis: Probabilistic Machine Learning Methods for Spatiotemporal Data with Applications to Environmental Health

Wednesday, April 29
12:00 pm - 1:00 pm

E14-633 (MIT Media Lab Dreyfoos Lecture Hall)

Title: Probabilistic Machine Learning Methods for Spatiotemporal Data with Applications to Environmental Health

Speaker: Renato Berlinghieri

Date: Wednesday, April 29, 2026

Time: 12:00 pm Boston time

Location: E14-633 (MIT Media Lab Dreyfoos Lecture Hall)

RSVP: If you plan to attend in person, it would help me if you could let me know via this calendar invite so I can share a list with the Media Lab ahead of time and make building access smoother.

Zoom: Email me at renb@mit.edu if you’d like the Zoom link — very happy to provide it!

Abstract: Environmental hazards like wildfires and oil spills are intensifying, and the health consequences are increasingly severe. Wildfire smoke alone now kills more people in the United States each year than car crashes. Modeling such hazards requires working with spatiotemporal data: measurements that vary across space and evolve over time. Yet standard machine learning methods for spatiotemporal data often fall short: they can produce forecasts that violate known physics, misestimate the timing or severity of extreme pollution events, and yield uncertainty estimates that are systematically overconfident.

In my PhD thesis, I develop probabilistic machine learning methods for spatiotemporal data that incorporate domain knowledge to improve predictions and provide reliable uncertainty quantification, with applications to environmental health. This program spans three directions: domain-informed prediction methods that embed physical constraints into probabilistic models, decision-centric evaluation metrics for air quality forecasts, and uncertainty quantification methods that remain valid under model misspecification and spatial dependence.

In this talk, I will focus on one thread of this program: trajectory inference from snapshot data. In many environmental and biological settings — such as tracking wildfire smoke, oil spill drift, or cell migration — we observe particles at discrete time points but never their continuous trajectories. Existing optimal transport and Schrödinger bridge methods can reconstruct trajectories from such snapshots, but they typically interpolate only between consecutive time points and default to generic diffusion dynamics. As a result, they can miss long-range temporal patterns and produce trajectories that drift randomly rather than following known physical forces like atmospheric winds or ocean currents. I will present a multi-marginal Schrödinger bridge framework that addresses both limitations: it leverages information across all available time points and incorporates domain-informed reference dynamics, so that inferred trajectories are both data-consistent and physically plausible. I will also describe an extension to forecasting beyond the last observed time point, using a maximum mean discrepancy objective that avoids requiring known noise levels. This work is co-first authored with Yunyi Shen. I will conclude by discussing how these methods connect to my broader research agenda on probabilistic machine learning for environmental health and biology.

Details

  • Date: Wednesday, April 29
  • Time: 12:00 pm - 1:00 pm
  • Category:
  • Location: E14-633 (MIT Media Lab Dreyfoos Lecture Hall)

Host