Leveraging Data Across Time and Space to Build Predictive Models for Healthcare-Associated Infections
The proliferation of electronic medical records holds out the promise of using machine learning and data mining to build models that will help healthcare providers improve patient outcomes. However, building useful models from these datasets presents many technical problems. The task is made challenging by the large number of factors, both intrinsic and extrinsic, influencing a patient’s risk of an adverse outcome, the inherent evolution of that risk over time, and the relative rarity of adverse outcomes.
In this talk, I will describe the development and validation of hospital-specific models for predicting healthcare-associated infections (HAIs), one of the top-ten contributors to death in the US. I will show how by adapting techniques from time-series classification, transfer learning and multi-task learning one can learn a more accurate model for patient risk stratification for the HAI Clostridium difficile (C. diff).
Applied to a held-out validation set of 25,000 patient admissions, our model achieved an area under the receiver operating characteristic curve of 0.81 (95%CI 0.78-0.84). On average, we can identify high-risk patients five days in advance of a positive test result. The model has been successfully integrated into the health record system at a large hospital in the US, and is being used to produce daily risk estimates for each in-patient. Clinicians at the hospital are now considering ways in which that information can be used to reduce the incidence of HAIs.
Thesis Supervisor: Prof. John Guttag