Doctoral Thesis: Machine Learning for Sepsis Prognosis: Prediction Models and Dissecting Electronic Health Records

Monday, April 29
3:00 pm - 4:30 pm

Haus room (36-428)

By: Wei Lao

Thesis Supervisor: Joel VoldmanĀ 


  • Date: Monday, April 29
  • Time: 3:00 pm - 4:30 pm
  • Category:
  • Location: Haus room (36-428)
Additional Location Details:

Sepsis is the body’s extreme response to an infection. It is a life-threatening medical emergency. Given the heavy burden sepsis has posed on the health care system, extensive research in the area has been performed to facilitate sepsis diagnosis. Sepsis prognosis can support the assessment of the likely progression of the disease and thus inform treatment decisions, but it is much less explored. Here I present two approaches to build sepsis prognosis models. First, I introduced the idea of assessing neutrophil function from simple-to-obtain phase microscopy images. I developed an experimental pipeline using measurement of reactive oxygen species generation as label of neutrophil function. I generated a large neutrophil imaging dataset and explored different deep learning approaches to predict neutrophil activation state. Second, I developed machine learning models to prediction sepsis patient future clinical score using electronic health records. As part of the effort, I developed a multidatabase extraction pipeline to facilitate electronic health records extraction process. My work demonstrates the potential of using deep learning models to evaluate functional aspects of the immune system and to predict sepsis patient future state, which could provide significant insight into sepsis prognostic monitoring and is easy to adapt in clinical settings.

It is of great significance to understand the input data in developing reliable and generalizable machine learning for healthcare models. It is also increasingly apparent that machine learning for healthcare models can predict patient sensitive information from data that does not explicitly encode it. However, we lack a clear understanding of the extent of the problem: what types of sensitive information can be predicted and how does it generalize to different models or different datasets. We also lack approaches to develop models that can make clinical inferences but not infer sensitive information. Critically, we lack approaches to explain such data encoding. Using electronic health records, I thoroughly investigated the ability of machine learning models to encode a wide range of patient sensitive information. I developed a strategy to ensure that clinical prediction is minimally based on patient-sensitive information. I presented an approach that can explain feature importance in patient sensitive information encoding. This set of studies not only allows us to gain deep understanding of the sepsis patient clinical score prediction model but also are applicable to a variety of machine learning models utilizing time-series data.