Putting the pieces together


June 6, 2016 | Audrey Resutek, MIT EECS

Alumna Katherine Yelick uses supercomputing to solve big problems.

Katherine Yelick

Photo courtesy of Katherine Yelick

Katherine Yelick ’82, SM ’85, PhD ’91 gives a guest lecture every year in an introductory computer science class at the University of California at Berkeley titled “How to Save the World with Computers.”

In the talk, Yelick, a professor of electrical engineering and computer science at the university, explains how supercomputers are making it possible to answer questions that once seemed impossibly complex. These systems are used in an array of fields — from particle physics, to climate science, to biological engineering.

Researchers use the supercomputers in facilities like Lawrence Berkeley National Laboratory, where Yelick is associate laboratory director for computing sciences, to simulate complex systems or tackle massive data analysis problems. They can be used to simulate the Earth’s climate, analyze the genomes of thousands of microbes living in a soil sample, or track stars in a simulation of the expansion of the universe.

An expert in the field of parallel processing, Yelick focuses on improving computing productivity and performance. She is the co-inventor of two parallel computing languages, UPC and Titanium, which are designed to let researchers take advantage of high performance computing techniques. One way they achieve this is by giving users a glimpse at features of the machine that are normally hidden.

“Programming languages are trying to give an abstraction for a machine in a way that is natural for people to think about when they’re designing algorithms and writing software,” Yelick says. “It’s about trying to understand what details of the machine you can hide from the programmer and what characteristics you want to expose so they can get good performance out of their code by understanding that it has certain features, such as parallel processors or a hierarchy of memories.”

Where a single processor performs computational tasks in order, completing the first task in a list before moving on to the next, parallel computing spreads tasks across many processors where they can be performed simultaneously. This division of labor is one the reasons, along with chip level improvements, that have dramatically enhanced performance over the last several decades. On average, supercomputers have gotten about 1000 times faster every decade.

For example, faster computers could allow climate scientists to build more detailed climate models. Currently, climate models can only approximate what is going on inside clouds; they do not resolve the processes inside them at fine scale. Having this information could allow scientists to make more accurate predictions about changes in precipitation, giving us insight into future droughts or flooding.

“The biggest question facing supercomputing today is, what are the problems we can’t even anticipate that we’d be able to solve with a computer that’s 1000 times faster,” she says. “Over the history of computing, if you look all around us with web search and internet connectivity and applications on smartphones, it’s hard to imagine what breakthroughs will come in the future if we can make computers faster and keep the cost and energy requirements in control.

Learning to love programming

As a freshman entering MIT, Yelick had been determined not to study computer science after an experience in a high school science project writing programs on paper tape soured her on the field. She grudgingly decided to take one computer science class, and to her surprise enjoyed it. She soon took another, and another.

“By the end of my freshman year I realized I really loved the process of programming,” she says. “I loved thinking about algorithms and thinking about how you map problems onto computers and the abstractions of computation. Those first couple of classes really drew me into it.”

Athletics also played a large role during Yelick’s time at MIT. As an undergrad she was a member of the women’s varsity rowing team, and spent a summer rowing on the US lightweight development team. Yelick went on to row as a graduate student as well, where she was coached by MIT Dean of Admissions Stu Schmill.

“I was not an athlete when I came to MIT. I don’t think there are many schools where you can come in not an athlete and leave as one on a varsity team,” she says.

Yelick went on to complete her graduate studies at MIT working with John Guttag, the Dugald C. Jackson Professor of Computer Science and Engineering. She first became interested in parallel computing as a way to make the automatic theorem proving tool that was the subject of her master’s thesis run more quickly.

Parallel processing eventually became her primary research interest, and EECS awarded her the George M. Sprowls Award for best PhD Dissertation in 1991.

Finding connections

Yelick has also developed a number of applications that make use of recent computing advances.

At the University of California at Berkeley, she and her graduate student, Evangelos Georganas, are working with computational biologists on an application aimed at identifying the genomes of microbes in a soil sample. Identifying a single microbe is no easy task, since genome sequencers fragment genes and inject errors as they “read” them.

Yelick compares the problem to trying to assemble a jigsaw puzzle without any idea what the picture is supposed to look like, and with extra pieces thrown in and other pieces missing.

Instead of identifying a single organism, Yelick and her student are trying to identify all of the genomes in a soil sample, which can easily be home to thousands of microbial entities.

“Now, it’s like you don’t have just one puzzle, you have several puzzles mixed together, and you might have a thousand copies of one puzzle and only one copy of another puzzle, and still have the errors as well,” Yelick says. “As you can imagine, the problem becomes very challenging.”

The new application uses UPC, a dialect of the programming language C developed by Yelick and colleagues at multiple institutions to support high-performance computing on large-scale multiprocessors.

Yelick has been developing UPC for the last two decades. Now, she and her students are using it and related languages for data analysis and simulation problems at the front line of what computers can do.

“What excites me most has changed over the years. It used to be about the technical questions, or algorithmic questions in computing. And I still like that a lot,” she says.

“Now, it’s really about finding connections where you can use computing — algorithms, mathematics, computation — to solve important problems. The exciting thing is putting all of the pieces together.”