EECS PhD student Tsui-Wei "Lily" Weng (center) networks with her peers at the 2018 Women in Data Science conference. Photo: Dana J. Quigley
Scott Murray | Institute for Data, Systems, and Society
Two hundred students, industry professionals, and academic leaders convened in Cambridge recently for the second annual Women in Data Science (WiDS) conference.
The MIT Institute for Data, Systems, and Society (IDSS) co-hosted the event with Harvard University’s Institute for Applied Computational Science (IACS) and Microsoft Research New England, in partnership with Stanford University. Attendance at the conference, held at Microsoft’s New England Research and Development (NERD) Center, grew from 150 participants last year.
“The WiDS conference highlighted female leadership in data science in the Boston area,” said WiDS steering committee member Caroline Uhler, Henry L. and Grace Doherty assistant professor in EECS and IDSS. “This event is particularly important to encourage more female scientists in related areas to join this emerging area that has such broad societal impact.”
Regina Barzilay, Delta Electronics Professor of EECS, gave the first presentation on how data science and machine learning approaches are improving cancer research. Barzilay explained how her experiences as a breast cancer survivor motivates her work: “Going through this treatment really opened my eyes to how much computer science and data science is not a part of that equation,” she said.
Barzilay discovered that most clinical decisions in breast cancer treatment were based on 3 percent of patients — those participating in clinical trials — leaving a huge amount of unused patient information. Her research uses Natural Language Processing (NLP) and deep neural networks to organize and analyze some of that previously unused patient data in the hopes of better understanding the connection between features of the disease and treatment outcomes. She is also developing techniques to use pixel-by-pixel analysis of mammogram images to assess cancer risks earlier than humans can.
Other speakers showed the broad applicability of data science approaches to many fields in addition to health care. Tamara Broderick, ITT Career Development Assistant Professor of EECS and a member of IDSS, spoke about developing a modern toolbox for data analysis. Ideally, that toolbox would be reliable and accurate, with theoretical guarantees on quality, Broderick said. The tools should allow practitioners to assess how certain they are about the reported outcomes. The output of the tools should be interpretable, and running the tools should be fast and easy for non-expert practitioners, she added. Moreover, the tools should be able to run on streaming data, where new data are constantly being acquired, and should be able to take advantage of modern, distributed computing. Broderick also discussed how to achieve these qualities in practical applications, citing examples in evaluating the efficacy of microloans and online advertising.
Francesca Dominici, professor of biostatistics and co-director of the Data Science Initiative at Harvard, presented on the power of data science to affect environmental policy. Dominici combines meteorological, traffic, land-use, and demographic data against Medicare claims to assess health risks of pollutants. Her research was referenced by U.S. Senator Cory Booker (D-N.J.) during a Senate hearing for Kathleen Hartnett White, President Trump’s former nominee to head the Council on Environmental Quality.
From the industry side of health care, Sanofi's Heather Bell spoke about the potential for data science to improve the clinical trial process. Bell is senior vice president and global head of digital and analytics at Sanofi, a biopharmaceutical company. She noted that 35 percent of clinical trial time was spent in recruiting patients, a process greatly improved by using patient data to identify candidates. This can shorten trials from years to months, save millions of dollars, open studies to thousands more participants, and ultimately lead to improved outcomes.
While data science has great potential to improve research in numerous fields, it presents new challenges. One critical challenge was highlighted by Cynthia Dwork, the Gordon McKay Professor of Computer Science at Harvard and Radcliffe Alumnae Professor at the Radcliffe Institute for Advanced Study. Dwork spoke of fairness and a “catalog of evils” identified by researchers in algorithm use, such as excluding zip codes for targeted advertising based on the racial makeup of neighborhoods. Algorithms aren’t neutral, Dwork argued, and can present problems if they “lack cultural awareness.”
Challenges were also a theme of the concluding panel talk of speakers and women working in data science. In addition to bias, they discussed transparency, security, and privacy. “I think people have no idea, no comprehension how much data is out there on them,” Bell mused. “I think there will need to be a reckoning on some of this.”
Despite these challenges, all agreed that data science can have an extraordinary impact in addressing social challenges. Data scientists can apply their skills in almost limitless ways by collaborating with researchers wherever data sets are or can be made available. As Broderick put it: “I had all these interests and thought: 'If only there was one magical field that combined them.'”
New to the conference this year was a hall of student posters showcasing data science work in a broad range of fields, and recruitment opportunities for industry partners. The 2018 conference also included the announcement of the winners of the first WiDS Datathon competition, which ran throughout February. The Datathon challenged teams to come up with data science solutions to alleviate global poverty. Worcester Polytechnic Institute’s “Team Minions,” comprised of Xi Liu and Ye Wang, was both the local Massachusetts first-place winner and a global winner. The pair emphasized the importance of open source models and tools to their project.
IDSS Executive Director Elizabeth Sikorovsky joined Harvard IACS Executive Director Catherine Chute in making opening and closing remarks. They noted that the event sold out on the first day and joined more than 150 other conferences and events worldwide, all led by women. “We are part of a global movement,” Chute said.