The question driving my work is: How should one deploy statistical data-analysis tools to enhance data-driven systems? Even partial answers to this question may have a large impact on science, government, and industry---each of whom are increasingly turning to statistical techniques to get value from their data.
To understand this question, my group has built or contributed to a diverse set of data-processing systems: a system called GeoDeepDive that reads and answers questions about the geology literature and is used by geologists to gain insights into the Earth's carbon cycle; a muon filter that is used in the IceCube neutrino telescope to process over 250 million events each day in the hunt for the origins of the universe; and a host of enterprise analytics applications with Oracle and EMC/Greenplum. Even within this diverse set, we have found common abstractions, which can be used to build and maintain such systems in a more cost-effective way. In this talk, I will describe some of these abstractions along with the theoretical and algorithmic questions that they raise. Finally, I will describe my vision of how and why classical data management will continue to play an important role in the age of statistical data analysis.
Papers, software, virtual machines that contain installations of our software, links to applications that are discussed in this talk, and our list of collaborators are available from http://www.cs.wisc.edu/hazy. We also have a YouTube channel (http://www.youtube.com/HazyResearch) with videos about our projects.
Christopher (Chris) Re is an assistant professor in the department of computer sciences at the University of Wisconsin-Madison. The goal of his work is to enable users and developers to build applications that more deeply understand and exploit data. Chris received his PhD from the University of Washington in Seattle under the supervision of Dan Suciu. For his PhD work in probabilistic data management, Chris received the SIGMOD 2010 Jim Gray Dissertation Award. Chris's papers have received four best-paper or best-of-conference citations, including best paper in PODS 2012, best-of-conference in PODS 2010 twice, and one best-of-conference in ICDE 2009). Chris received an NSF CAREER Award in 2011.