One of the fundamental aims of biology is to determine what lies at the root of differences across individuals, species, diseases, and cell types. Furthermore, the sequencing of genomes has revolutionized the ways in which scientists can investigate biological processes and disease pathways; new genome-wide, high-throughput experiments require computer scientists with a biological understanding to analyze and interpret the data to improve our understanding about life science. This provides us with a key opportunity to use computational techniques for new biological discoveries.
While genetic variation plays an important role in influence phenotype, sequence alone cannot account for all differences: for example, different types of cells in an individual have varying function and attributes, but identical genetic makeup. This highlights the importance of studying epigenetic changes, which are dynamic chemical changes to and around the DNA. While the DNA of every cell in an individual is the same, the epigenetic context for that DNA varies from cell to cell. In this way, these epigenetic differences play a crucial role in gene regulation, with epigenetic changes both causing and recording regulatory mechanisms.
In this talk, I will combine the power of computational, statistical, and data science approaches with the new wave of epigenetic data at a genome-wide level in a number of ways. First, we demonstrate the importance of computational analysis at an epigenomic level by identifying an epigenomic signature of the olfactory receptor gene family that gives insight into the mechanism behind monogenic gene regulation. Additionally, we show the power of integrative analysis through the combination of DNA methylation data with chromatin state profiles and cell types to reveal insightful epigenetic patterns and relationships. Finally, we explain our development of ChromDiff, a novel statistical and information theoretic computational methodology to identify chromatin state differences in groups of samples based on both gene bodies and linked regulatory regions. In our methodology, we use correction for external covariates to isolate the relevant signal, and as a result, we find that our method outperforms existing computational methods, with further validation through randomized simulations. By applying our methodology to characteristics including sex, developmental age, and tissue type, we unveil relevant chromatin states and genes that distinguish the groups of epigenomes, with further validation of our results through gene expression and gene set enrichment.
Thesis Supervisor: Prof. Manolis Kellis
Thesis Committee: Profs. Bonnie Berger and Pardis Sabeti