Eunice Jun “Data analysis tools for statistical non-experts”
Data analysis is critical to science, public policy, and business. Despite their importance, statistical analyses are difficult to author, especially for researchers with expertise outside of statistics. Existing statistical tools, prioritizing mathematical expressivity and computational control, are low-level while researchers’ motivating questions and hypotheses are high-level. Researchers need to translate their questions and hypotheses into low-level statistical code in an error-prone process that involves grappling with their domain knowledge, statistics, and programming.
In this talk, I will introduce two tools that embody a new way of authoring analyses: Tea and Tisane. Researchers directly express their domain knowledge through higher level abstractions, and the tools will validate the data, select a statistical analysis, and implement it, all while educating analysts about why a statistical approach is valid. Tea helps analysts author statistical tests. Tea’s key insight is that statistical test selection can be cast as a constraint satisfaction problem. Tisane enables analysts to author generalized linear models with or without mixed effects, which are difficult for even statistical experts to author. Using Tisane, analysts can express their conceptual models using a high-level domain specific language. Tisane translates these conceptual models into causal DAGs and engages analysts in a disambiguation process to arrive at an output statistical model. Real-world researchers have already used these tools to conduct analyses in published research that push their own disciplines forward. I will also introduce “hypothesis formalization,” a series of cognitive and operational steps analysts take to translate their research questions into statistical implementations. Hypothesis formalization retrospectively explains why Tea improves statistical testing and directly inspired the design of Tisane.
This talk exemplifies how combining human-computer interaction with other areas in and outside of computer science leads to software tools that impact real-world users. Tea and Tisane serve as platforms for further research into computational support for statistical analysis. In the future, an ecosystem of tools and computational representations to further incorporate domain knowledge throughout the data lifecycle will be critical to explain analyses and improve scientific reproducibility.
Eunice Jun is a PhD candidate at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Jeffrey Heer and René Just. Her research mission is to empower more people to understand and act on data. She conducts research in human-computer interaction. Close collaborations with data scientists and other researchers outside of computer science inspire her work. Her thesis combines HCI with techniques from programming languages/software engineering and statistics to develop new interactive data analysis tools. She has received a NSF graduate research fellowship, paper awards at ACM CHI and ACM CSCW, and an honorable mention from the Barry Goldwater Foundation.
- Date: Tuesday, March 21
- Time: 11:00 am - 12:00 pm
- Category: Special Seminar
- Location: 34-401A
- Wojciech Matusik