Recent commitments to enhance the use of data for learning in medicine provides the opportunity to apply instruments and abstractions from computational learning theory to accelerate learning about diseases, thereby improving healthcare quality faster. We approach the learning problem of creating and updating a taxonomy of diseases as an unsupervised learning problem. To evaluate and improve the disease taxonomy, we develop a method for estimating uncertainty in the taxonomy using a variety of established data sets relating diseases to other variables such as genes, symptoms or therapeutics. We then demonstrate how to update disease relationship using these data sets. Finally, we show that latent information in publicly available data may be accessed to further compare diseases, enabling biological comparisons with seemingly superficial data sources. Our results demonstrate the possibility of continuously updating the disease taxonomy in a data-driven manner, thus accelerating the pace at which the disease taxonomy is updated.
Thesis Supervisor: Luis Perez-Breva
Thesis Committee: Peter Szolovits, Thomas Heldt