Deep-learning technique predicts clinical treatment outcomes

When it comes to treatment strategies for critically ill patients, clinicians want to be able to consider all their options and timing of administration, and make the optimal decision for their patients. While clinician experience and study has helped them to be successful in this effort, not all patients are the same, and treatment decisions at this crucial time could mean the difference between patient improvement and quick deterioration. Therefore, it would be helpful for doctors to be able to take a patient’s previous known health status and received treatments and use that to predict that patient’s health outcome under different treatment scenarios, in order to pick the best path.

Now, a deep-learning technique, called G-Net, from researchers at MIT and IBM provides a window into causal counterfactual prediction, affording physicians the opportunity to explore how a patient might fare under different treatment plans. The foundation of G-Net is the g-computation algorithm, a causal inference method that estimates the effect of dynamic exposures in the presence of measured confounding variables — ones that may influence both treatments and outcomes. Unlike previous implementations of the g-computation framework, which have used linear modeling approaches, G-Net uses recurrent neural networks (RNN), which have node connections that allow them to better model temporal sequences with complex and nonlinear dynamics, like those found in the physiological and clinical time series data. In this way, physicians can develop alternative plans based on patient history and test them before making a decision.

“Our ultimate goal is to develop a machine learning technique that would allow doctors to explore various ‘What if’ scenarios and treatment options,” says Li-wei Lehman, MIT research scientist in the MIT Institute for Medical Engineering and Science and an MIT-IBM Watson AI Lab project lead. “A lot of work has been done in terms of deep learning for counterfactual prediction but [it’s] been focusing on a point exposure setting,” or a static, time-varying treatment strategy, which doesn’t allow for adjustment of treatments as patient history changes. However, her team’s new prediction approach provides for treatment plan flexibility and chances for treatment alteration over time as patient covariate history and past treatments change. “G-Net is the first deep-learning approach based on g-computation that can predict both the population-level and individual-level treatment effects under dynamic and time varying treatment strategies.”

The research, which was recently published in the Proceedings of Machine Learning Research, was co-authored by Rui Li MEng ’20, Stephanie Hu MEng ’21, former MIT postdoc Mingyu Lu MD, graduate student Yuria Utsumi, IBM research staff member Prithwish Chakraborty, IBM Research director of Hybrid Cloud Services Daby Sow, IBM data scientist Piyush Madan, IBM research scientist Mohamed Ghalwash, and IBM research scientist Zach Shahn.

Tracking disease progression

To build, validate, and test G-Net’s predictive abilities, the researchers considered the circulatory system in septic patients in the ICU. During critical care, doctors need to make trade-offs and judgement calls, such as ensuring the organs are receiving adequate blood supply without overworking the heart. For this, they could give intravenous fluids to patients to increase blood pressure; however, too much can cause edema. Alternatively, physicians can administer vasopressors, which act to contract blood vessels and raise blood pressure.

In order to mimic this and demonstrate G-Net’s proof-of-concept, the team used CVSim, a mechanistic model of a human cardiovascular system that’s governed by 28 input variables characterizing the system’s current state, such as arterial pressure, central venous pressure, total blood volume, and total peripheral resistance, and modified it to simulate various disease processes (e.g., sepsis or blood loss) and effects of interventions (e.g., fluids and vasopressors). The researchers used CVSim to generate observational patient data for training and for “ground truth” comparison against counterfactual prediction. In their G-Net architecture, the researchers ran two RNNs to handle and predict variables that are continuous, meaning they can take on a range of values, like blood pressure, and categorical variables, which have discrete values, like the presence or absence of pulmonary edema. The researchers simulated the health trajectories of thousands of “patients” exhibiting symptoms under one treatment regime, let’s say A, for 66 timesteps, and used them to train and validate their model.

Testing G-Net’s prediction capability, the team generated two counterfactual datasets. Each contained roughly 1,000 known patient health trajectories, which were created from CVSim using the same “patient” condition as the starting point under treatment A. Then at timestep 33, treatment changed to plan B or C, depending on the dataset. The team then performed 100 prediction trajectories for each of these 1,000 patients, whose treatment and medical history was known up until timestep 33 when a new treatment was administered. In these cases, the prediction agreed well with the “ground-truth” observations for individual patients and averaged population-level trajectories.

A cut above the rest

Since the g-computation framework is flexible, the researchers wanted to examine G-Net’s prediction using different nonlinear models — in this case, long short-term memory (LSTM) models, which are a type of RNN that can learn from previous data patterns or sequences — against the more classical linear models and a multilayer perception model (MLP), a type of neural network that can make predictions using a nonlinear approach. Following a similar setup as before, the team found that the error between the known and predicted cases was smallest in the LSTM models compared to the others. Since G-Net is able to model the temporal patterns of the patient’s ICU history and past treatment, whereas a linear model and MLP cannot, it was better able to predict the patient’s outcome.

The team also compared G-Net’s prediction in a static, time-varying treatment setting against two state-of-the-art deep-learning based counterfactual prediction approaches, a recurrent marginal structural network (rMSN) and a counterfactual recurrent neural network (CRN), as well as a linear model and an MLP. For this, they investigated a model for tumor growth under no treatment, radiation, chemotherapy, and both radiation and chemotherapy scenarios. “Imagine a scenario where there’s a patient with cancer, and an example of a static regime would be if you only give a fixed dosage of chemotherapy, radiation, or any kind of drug, and wait until the end of your trajectory,” comments Lu. For these investigations, the researchers generated simulated observational data using tumor volume as the primary influence dictating treatment plans and demonstrated that G-Net outperformed the other models. One potential reason could be because g-computation is known to be more statistically efficient than rMSN and CRN, when models are correctly specified.

While G-Net has done well with simulated data, more needs to be done before it can be applied to real patients. Since neural networks can be thought of as “black boxes” for prediction results, the researchers are beginning to investigate the uncertainty in the model to help ensure safety. In contrast to these approaches that recommend an “optimal” treatment plan without any clinician involvement, “as a decision support tool, I believe that G-Net would be more interpretable, since the clinicians would input treatment strategies themselves,” says Lehman, and “G-Net will allow them to be able to explore different hypotheses.” Further, the team has moved on to using real data from ICU patients with sepsis, bringing it one step closer to implementation in hospitals.

“I think it is pretty important and exciting for real-world applications,” says Hu. “It’d be helpful to have some way to predict whether or not a treatment might work or what the effects might be — a quicker iteration process for developing these hypotheses for what to try, before actually trying to implement them in in a years-long, potentially very involved and very invasive type of clinical trial.”

This research was funded by the MIT-IBM Watson AI Lab.

A security technique to fool would-be cyber attackers

Multiple programs running on the same computer may not be able to directly access each other’s hidden information, but because they share the same memory hardware, their secrets could be stolen by a malicious program through a “memory timing side-channel attack.”

This malicious program notices delays when it tries to access a computer’s memory, because the hardware is shared among all programs using the machine. It can then interpret those delays to obtain another program’s secrets, like a password or cryptographic key.

One way to prevent these types of attacks is to allow only one program to use the memory controller at a time, but this dramatically slows down computation. Instead, a team of MIT researchers has devised a new approach that allows memory sharing to continue while providing strong security against this type of side-channel attack. Their method is able to speed up programs by 12 percent when compared to state-of-the-art security schemes.

In addition to providing better security while enabling faster computation, the technique could be applied to a range of different side-channel attacks that target shared computing resources, the researchers say.

“Nowadays, it is very common to share a computer with others, especially if you are do computation in the cloud or even on your own mobile device. A lot of this resource sharing is happening. Through these shared resources, an attacker can seek out even very fine-grained information,” says senior author Mengjia Yan, the Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

The co-lead authors are CSAIL graduate students Peter Deutsch and Yuheng Yang. Additional co-authors include Joel Emer, a professor of the practice in EECS, and CSAIL graduate students Thomas Bourgeat and Jules Drean. The research will be presented at the International Conference on Architectural Support for Programming Languages and Operating Systems.

Committed to memory

One can think about a computer’s memory as a library, and the memory controller as the library door. A program needs to go to the library to retrieve some stored information, so that program opens the library door very briefly to go inside.

There are several ways a malicious program can exploit shared memory to access secret information. This work focuses on a contention attack, in which an attacker needs to determine the exact instant when the victim program is going through the library door. The attacker does that by trying to use the door at the same time.

“The attacker is poking at the memory controller, the library door, to say, ‘is it busy now?’ If they get blocked because the library door is opening already — because the victim program is already using the memory controller — they are going to get delayed. Noticing that delay is the information that is being leaked,” says Emer.

To prevent contention attacks, the researchers developed a scheme that “shapes” a program’s memory requests into a predefined pattern that is independent of when the program actually needs to use the memory controller. Before a program can access the memory controller, and before it could interfere with another program’s memory request, it must go through a “request shaper” that uses a graph structure to process requests and send them to the memory controller on a fixed schedule. This type of graph is known as a directed acyclic graph (DAG), and the team’s security scheme is called DAGguise.

Fooling an attacker

Using that rigid schedule, sometimes DAGguise will delay a program’s request until the next time it is permitted to access memory (according to the fixed schedule), or sometimes it will submit a fake request if the program does not need to access memory at the next schedule interval.

“Sometimes the program will have to wait an extra day to go to the library and sometimes it will go when it didn’t really need to. But by doing this very structured pattern, you are able to hide from the attacker what you are actually doing. These delays and these fake requests are what ensures security,” Deutsch says.

DAGguise represents a program’s memory access requests as a graph, where each request is stored in a “node,” and the “edges” that connect the nodes are time dependencies between requests. (Request A must be completed before request B.) The edges between the nodes — the time between each request — are fixed.

A program can submit a memory request to DAGguise whenever it needs to, and DAGguise will adjust the timing of that request to always ensure security. No matter how long it takes to process a memory request, the attacker can only see when the request is actually sent to the controller, which happens on a fixed schedule.

This graph structure enables the memory controller to be dynamically shared. DAGguise can adapt if there are many programs trying to use memory at once and adjust the fixed schedule accordingly, which enables a more efficient use of the shared memory hardware while still maintaining security.

A performance boost

The researchers tested DAGguise by simulating how it would perform in an actual implementation. They constantly sent signals to the memory controller, which is how an attacker would try to determine another program’s memory access patterns. They formally verified that, with any possible attempt, no private data were leaked.

Then they used a simulated computer to see how their system could improve performance, compared to other security approaches.

“When you add these security features, you are going to slow down compared to a normal execution. You are going to pay for this in performance,” Deutsch explains.

While their method was slower than a baseline insecure implementation, when compared to other security schemes, DAGguise led to a 12 percent increase in performance.

With these encouraging results in hand, the researchers want to apply their approach to other computational structures that are shared between programs, such as on-chip networksThey are also interested in using DAGguise to quantify how threatening certain types of side-channel attacks might be, in an effort to better understand performance and security tradeoffs, Deutsch says.

This work was funded, in part, by the National Science Foundation and the Air Force Office of Scientific Research.

Doctoral Thesis: NMR studies of quantum thermalization

Pai Peng

Abstract:

How a many-body quantum system thermalizes under its internal interactions is an important question with fundamental and practical impacts on quantum information science, condensed matter physics, and quantum engineering. Nuclear spins in solids, featuring large system size, long coherence time and excellent controllability, provide a great playground to study quantum thermalization. I will present three research projects in my PhD thesis — (1) the emergence of classical hydrodynamical phenomena from thermalizing quantum systems; (2) prethermalization in periodically driven systems [1-3]; (3) robust dynamical decoupling of interactions using reinforcement learning techniques [4]. These results not only shine a light on various important questions in quantum thermalization, but also provide new quantum control techniques that can be generalized to other platforms.

Thesis Supervisor: Prof. Paola Cappellaro

[1] PP, C. Yin, X. Huang, C. Ramanathan, P. Cappellaro. Nat. Phys. 17, 444 (2021).
[2] C. Yin, PP, X. Huang, C. Ramanathan, P. Cappellaro. Phys. Rev. B 103, 054305 (2021).
[3] K.X. Wei, PP, O. Shtanko, I. Marvian, S. Lloyd, C. Ramanathan, P. Cappellaro. Phys. Rev. Lett. 123, 090605 (2019).
[4] PP, X. Huang, C. Yin, L. Joseph, C. Ramanathan, P. Cappellaro. arXiv:2102.13161.

New recipients of Meta (Facebook) Fellowship for 2022

Meta (Facebook) recently announced the winners of its highly competitive 2022 fellowships. The incoming group of Fellowship recipients includes four MIT graduate students, two of whom study within the Department of EECS.

Jaume Vives i Bastida is a PhD candidate in the Economics Department, advised by Alberto Abadie, Anna Mikusheva and Tobias Salz. His goal as a researcher is to design econometric and machine learning tools that improve policy making, while being transparent and robust to different modeling assumptions. In particular, his research has focused on the properties of shrinkage estimators and on extensions to the synthetic control method, a popular tool used by applied researchers and policy makers. On the practical side, Jaume applies these methods to understand complex interactions in the digital economy, such as in online platforms, in which data driven decision making plays a key role. He has had the chance to put these methods to work in real life situations at Google and Quantco.

Prior to joining MIT, Jaume obtained a BSc. in Econometrics and Mathematical Economics from LSE, was an exchange student at UC Berkeley, and worked as a research professional at the University of Chicago under Eric Budish. In his spare time Jaume enjoys playing squash and chess, and going back to his hometown of Barcelona, where the best food in the world can be found.

Lucy Chai is a graduate student in EECS at MIT CSAIL, advised by Phillip Isola; her current research focuses upon computer vision and image synthesis. In particular, she is interested in learning from data collections to generate augmented forms of images. The results of this can enable interactive image editing that combines user input with learned image priors and be applied to investigate downstream visual analysis tasks. She has spent two summers at Adobe Research working with Richard Zhang, Jun-Yan Zhu, Michael Gharbi, and Eli Shechtman, and frequently collaborates with Ser-Nam Lim at Facebook.

Prior to joining MIT, Chai attended Churchill College, University of Cambridge, where she earned her MPhil in Machine Learning studying uncertainty and interpretability in Bayesian neural networks. Chai earned her undergraduate degree at the University of Pennsylvania in Computer Science and Bioengineering, where she worked with Dr. Danielle S. Bassett in computational neuroscience, focusing on modelling neural processes as dynamic networked systems Among other honors, Chai has been awarded a NSF Graduate Research Fellowship and Adobe Research Fellowship.

Dishita Turakhia is a fourth-year PhD candidate in the Human-Computer Interaction Engineering Group at MIT CSAIL, where she is advised by Professor Stefanie Mueller. Her current research lies at the intersection of system design and learning sciences, with a particular focus on AR/VR applications for autodidactic learning of skills. Her projects enable autodidactic skill learning of motor skills, fabrication skills, and maker skills. Turakhia’s project on the adaptive learning of motor skills was covered by MIT News.

Before starting her Ph.D., Turakhia earned a dual masters in EECS (MIT) and computational design (SMArchS, MIT) in addition to her masters in design technology (EmTech, AA).  Prior to returning to academia, she worked as an architect and computational designer for 5+ years in Mumbai, London, Singapore, and Bern. Turakhia earned her bachelor’s in architecture from KRVIA (Mumbai University).

Praneeth Vepakomma is a PhD Student within Media Arts & Sciences, advised by Ramesh Raskar. Vepakomma’s research focuses on distributed computation in statistics and machine learning under constraints of privacy, communication, and efficiency. Foundationally inspired by non-asymptotic statistics, randomized algorithms, combinatorics, and systems design, his research has applications in a) private independence testing and private k-sample testing in statistics, b) bridging privacy with social choice theory, c) private mechanisms for training and inference in ML, d) privately estimating non-linear measures of statistical dependence between multiple parties, and e) split learning.

Among other honors, Vepakomma has been selected as a SERC Scholar (Social and Ethical Responsibilities of Computing Scholar) by MIT’s Schwarzman College of Computing, while “FedML: A research library and benchmark for federated machine learning” won a Baidu Best Paper Award at NeurIPS 2020-SpicyFL, and “NoPeek-Infer: Preventing face reconstruction attacks in distributed inference after on-premise training” won the Mukh Best Paper Award at IEEE FG-2021. He was interviewed in the book “Data Scientist: The Definitive Guide to Becoming a Data Scientist” and his work on Split Learning has been featured in the MIT Technology Review. Vepakomma was previously a scientist at Amazon, Motorola Solutions, and at various startups; additionally, he has interned at Apple, Corning, and TripleBlind. He holds an MS in Mathematical and Applied Statistics from Rutgers University, New Brunswick.

Akila Saravanan named Brooke Owens Fellow

Akila Saravanan, a junior double majoring in aerospace engineering and electrical engineering and computer science, is a recipient of the Brooke Owens Fellowship. “Brookies” are selected based on their commitment to their communities, stand-out creative abilities, record of leadership, incredible talent, and their desire to pursue a career in aerospace. Saravanan is among 51 women selected from a competitive pool of thousands of applicants this year. As part of her fellowship, Saravanan will be working at Venturi Astrolab in Hawthorne, California this summer.

“I’m honored to receive this fellowship and excited for the opportunities it will open up for me in my career,” says Saravanan. “The internship itself will be a great experience, but I think the biggest benefit is the network it will provide me within the aerospace community. I believe the bonds I’ll make through this experience will last beyond just the summer, and I’m excited to be a part of it.”

The Brooke Owens Fellowship is a nonprofit program that aims to increase access and opportunities for undergraduate women and other gender minorities by connecting them with internships, mentors in the field, and other networking opportunities to help alleviate the gender imbalance that has been prevalent historically in the aerospace industry.

“I’m looking forward to exploring commercial research, which will expose me to a different side of research and allow me to work on new and exciting projects,” says Saravanan. “To me, one of the most interesting aspects of commercial aerospace is how it operates on a much shorter time-scale compared to academic research, due to the speed of putting ideas into production. I think the coolest part of this will be to see my work being used on upcoming missions.”

Currently, Saravanan is working as a UROP in the Dynamics, Infrastructure Networks, and Mobility (DINaMo) Research Group. Her project focuses on developing an autonomous and reactive continuous data collection system deployed on fleets of drones, focused on making decisions about optimal sampling locations in a large grid while accounting for resource constraints to coordinate multi-vehicle flight.

“The Brooke Owens Fellowship is a prestigious honor to receive, and after working with Akila for the past two years, it’s incredibly well-deserved,” says Hamsa Balakrishnan, William E. Leonhard (1940) Professor of Aeronautics and Astronautics and principal investigator of DINaMo. “Akila has already made significant contributions while working in my lab, and I’m excited to see how she grows as a researcher after gaining industry experience in her internship next summer.”

Founded in 2016, the fellowship program is named in honor of D. Brooke Owens, a pilot and aerospace industry veteran whose experience spanned NASA’s Johnson Space Center, the Federal Aviation Administration, and the White House Office of Management and Budget before she died of cancer at age 36 the same year. This year’s selection panel included Emily Calandrelli SM’13, a science communicator and host of Netflix’s Emily’s Wonder Lab as well as Caroline Juang, a PhD Candidate at Columbia University in the Department of Earth and Environmental Sciences; Diana Trujillo, an Aerospace Engineer at NASA’s Jet Propulsion Laboratory; Kayla Watson, a System Reliability Engineer at Amazon Prime Air; and Will Pomerantz, co-founder of the Brooke Owens Fellowship.

Team creates 3D objects that change their appearance from different viewpoints

Picture a birthday card that flickers between images of a birthday cake and flowers as you turn the card and view it from different angles. No doubt you can think of other examples of such morphing images in, say, advertisements. Until now, however, the effect has been limited to flat surfaces.

Enter the work of MIT researchers who, for the first time, have developed a system for creating 3D objects that change their appearance when seen from different viewpoints. The same team also developed an editing tool available free online that can allow anyone to design and build such 3D objects, all through 3D printing.

“Our work opens up the idea of what a physical object can be,” says Yunyi Zhu, an MIT graduate student in the Department of Electrical Engineering and Computer Science (EECS). “It’s part of a larger vision for making dynamic objects that change their appearance, color pattern, and shape. We address appearance, which is one dimension of the concept of reprogrammable objects,” says Zhu, who is a co-first author of a paper on the work presented in October at the 2021 Association for Computing Machinery’s Symposium on User Interface Software and Technology.

Stefanie Mueller is the leader of the work, an assistant professor in EECS, and an author of the paper. Mueller is also affiliated with MIT’s Computer Science and Artificial Intelligence Laboratory.

Other co-first authors of the paper are Jiani Zeng (an MIT graduate student who has since graduated) and Honghao Deng (a Harvard graduate student who has since graduated). In addition to Zhu, Zeng, Deng, and Mueller, other authors are Michael Wessely, a postdoctoral associate in EECS, and Axel Kilian, a visiting assistant professor in the Department of Architecture.

“This exciting research radically shifts what the future of product design [will] look like. As an interaction designer, I was stimulated to imagine how in the future everyday physical objects could have thousands of different appearances (colors and patterns) depending on the viewing angles. I can think of many application, from mesmerizing fashion designs to aesthetic sculptures that convey different poetic narratives depending on the viewpoints,” says Ken Nakagaki, a postdoctoral associate at the MIT Media Lab who is an incoming assistant professor at the University of Chicago. Nakagaki was not involved in the work.

How it Works

The changing images on a flat card—and now 3D objects—are thanks to many tiny lenses printed across the card or object. About six of the lenses created by the MIT team could fit across the surface of a dime. Known as lenticular lenses, each lens covers a pattern of tiny colored image spots.

“Because of the magnifying effect of the lens, it displays the color from only one of the colored image spots, which is only a small portion of the entire area under the lens,” the authors write. Which magnified spot the viewer sees depends on the viewer’s viewpoint and the “different incident angles of the light hitting the lens” from that viewpoint. Multiple lenses together “form a lenticular display and collectively show an image that varies dependent on viewpoint.”

The MIT team’s editing tool takes as input the 3D model, or object, that is desired as well as the images that a designer wants to morph from one into another. It then calculates the lens placement and color pattern across the object’s surface that will lead to the desired effect. The designer can then preview the resulting object before sending the “directions” to a 3D printer, where the object, color patterns, and lenses are all printed in one pass.

Many Applications

A kettlebell guides a user to the correct position for holding the device during exercise. It was created using tools developed at MIT that allow 3D objects to change their appearance from different viewpoints. Image credit Zeng et al, MIT.

The team went on to create four examples of 3D objects with images that change based on the viewer’s perspective. The idea was to demonstrate the ability to create objects with different geometries, different image complexities, and varying numbers of morphing images. The demonstration objects, each of which are printed with thousands of lenses, include:

—a kettlebell for exercise that can indicate to the user whether they are holding the device properly. If the device is too low, an arrow pointing up appears; if too high, an arrow pointing down takes its place. The correct position is indicated with a check mark.

—A bedside lamp shade that displays different greetings depending on whether a person is sitting up in bed (Good Day) or lying down (Good Night).

—A case for earpods that flickers between several colored stripes depending on how the user handles it.

—A shoe printed with a motivational message that only the user can see (the message is invisible to a bypasser or someone standing nearby).

The Sweet Spot

Zhu says that the major challenge involved in the work was determining the “sweet spot” for the size of each lens. “The smaller the lens, the better the resolution,” she says. “But smaller lenses also have more defects, so the fabrication quality will be worse.” Zhu and colleagues ultimately found that a lens that’s three millimeters in diameter “fits most needs.”

The team also tested other parameters, including the orientation of a lens (for example, printed facing upwards or sideways) and various post-processing techniques for the lenses. Lenses facing upwards and treated with painting varnish led to the best results.

It took about two years to move the project from concept to the creation of demonstration objects. The best part of the work for Zhu was “seeing that the process actually works and holding an object in my hand. That was extremely satisfying.”

This work was supported by the National Science Foundation and by the MIT Materials Research Laboratory under the MIT.nano Immersion Lab Gaming Program funded by NCSOFT.

The Millionth Algorithm: the runaway success of a foundational textbook

Charles E. Leiserson, the Edwin Sibley Webster Professor within the Department of EECS, recently received some tremendous news: Introduction to Algorithms, the textbook Leiserson coauthored with Tom Cormen, Ron Rivest, and Cliff Stein, has now officially sold over 1 million copies worldwide.

Lauded for its clarity, the book is premised on a “start from fundamentals” approach that welcomes students of many backgrounds and learning styles, regardless of their familiarity with advanced mathematics. Co-author Cliff Stein offered this explanation for the runaway success of the “big book”: “One reason for the success of Introduction to Algorithms has been the working style of the authors. Although divide and conquer is an important algorithmic paradigm, it is not the way we work–multiple authors collaborate on every page of the book.  This cooperation, and the commitment to clarity and rigor has been one of the main reasons for the success of the book.” Co-author Ron Rivest added, “It was a great personal pleasure working with Tom Cormen, Charles Leiserson, and Cliff Stein on the latest edition of Introduction to Algorithms. We also greatly appreciated the numerous suggestions and comments provided by colleagues and students.  We hope that this text will provide guidance for many years to come, both for students studying algorithms for the first time, and for more advanced students.” And co-author Tom Cormen offered this explainer of the daunting process of assembling a comprehensive textbook and keeping it current over many revisions.

We sat down over Zoom with Charles Leiserson to chat about Introduction to Algorithms: its origins, its runaway success, and its impact over the years.

First off, congratulations on the one millionth sale of Introduction to Algorithms. That’s an absolutely incredible milestone. Tell us about the moment when you found out, your thoughts and emotions.

It’s quite remarkable to me that we’re in this position; for a textbook, this is an extraordinarily long life. Most textbooks don’t publish a second edition, and we’re currently working on our fourth, the edition which we thought might sell the millionth copy. So, it was a surprise when Elizabeth Swayze (Senior Acquisitions Editor at MIT Press), emailed us saying they had just discovered we’d gone over the one million mark.

You coauthored Introduction to Algorithms over 30 years ago, in 1990, when the field obviously looked very different. Tell me a little about the process of revision within a quickly changing and developing field.

In some ways it’s been more complicated than when we wrote the first edition, although that was the most amount of work. When we were writing the first edition, there were three of us and we were all at MIT; Ron Rivest’s office was next to mine, Tom Cormen was my graduate student, and we were all able to work side by side. For the last edition, we did have some meetings, but it was only two or three meetings in person before COVID hit, and after that we worked independently and relied a lot more heavily on Zoom for meetings. I can’t imagine where I got the time to do the first edition, because this last one has been a real struggle, and yet it’s not as much time (by a long shot) as what we put in years ago. I’ve simply got more on my plate!

Tell me about the book’s origins. Whose idea was it first to try to write a definitive introduction to algorithms? How did you determine the gap in the market, pitch the book, etc?

When I got to MIT, I wanted to teach the algorithms class: 6.046, and it wasn’t available for the first couple of years—other faculty with seniority were signed up to teach it. I had to wait a few years and teach many other classes before I got the opportunity to teach 6.046. I inherited great notes from the people who had taught it before (including Ron Rivest, among others). At the time, there were textbooks covering some topics in algorithms—I think the dominant one was by Aho, Hopcroft, and Ullman, Design and Analysis of Computer Algorithms. Donald Knuth, a Turing Award winner, had another one, a three-volume series called The Art of Computer Programming. And there were also books that were dealing with algorithms but didn’t cover all the same topics, such as Eugene Lawler’s Combinatorial Optimization. These books all inspired and influenced me.

But these and other textbooks sometimes invoked higher math, and often I found those explanations baffling. What we found as we were writing our own book was that in fact some of the theorems at the time had proofs that were actually wrong; they were handwaves, they relied on circular or incomplete logic, that kind of thing. These were serious problems! In response to that discovery, I wanted to write a book which reasoned from fundamentals, a book which a student could use to learn about algorithms with little assumed background in higher math—algebra, but no calculus.

We began with a chapter that outlined the math axioms, the basic principles, which you needed to ground your understanding. The nucleus of this work was the set of lecture notes taken down by my amazing TA, Tom Cormen, who not only took notes for all my lectures, but took the time to rewrite them into truly useful documents which we could carry forward into subsequent iterations of every class. While talking with Tom, I said, “You know, I think we could assemble a textbook out of what we’ve got here, if we put the effort in.”

Since Ron had the most involvement in 6.046 before me, we asked Ron if he was interested in collaborating on a textbook, and he was, so we got started. I thought we could do it in two years—it took us three and a half!

Tell me about the co-authorship process; how did it change the way you approached writing and work? Did you have a specialization, or a favorite part of the writing process?

Within theoretical computer science, you list authors alphabetically, so we don’t typically have first authors. One of the nice things that that convention promotes is the notion that everybody owns the work, 100%. Everybody gets full credit for the work, and if there’s a problem anywhere, everyone takes the blame. Everyone is responsible, and everyone gets to benefit. And that’s a form of collaboration that appeals to me. This principle led us to a process where we all wrote all of the chapters in the first edition. One of us would draft a version; the other would edit and reorganize it; then the chapter would go to the third person; then back to the first. Each chapter would bounce around.

It was one of the best collaborations I have ever had with other people, because we complemented each other. Ron is a Turing Award winner and is enormously technically capable. He pulled some massive bugs out of my work—I was able to return the favor only maybe once or twice, but he really pointed out some doozies in my own stuff. And Tom was a fabulous writer. And I was maybe in the middle on those things, in that I was strong technically (though not as strong as Ron) and I was an excellent writer (though not as strong as Tom).

The partnership showed its strength in many ways; for example, at the time we wrote the book, there was really no good indexing software—but Ron automated our two-level index, so we could put it right into our books. Additionally, in those days, technical works were usually typeset by the publisher, and then you would have to go through and correct all the math and technical stuff that they got wrong. But we produced a machine version of our book, so we were one of the first books that MIT Press produced where the authors controlled the typesetting. Our copy editor, Larry Cohen, would mark up our print and then we would go through and implement the changes, and that way we ensured that math errors wouldn’t have a tendency to creep in. 

You’ve obviously had to think a great deal about how students learn and the best ways to teach complex concepts. How has authoring this book affected your teaching?

I spoke about my philosophy on this at some length when I received the IEEE Computer Society Taylor L. Booth Education award. You have to understand that learning is largely emotional; you tend to think of technical learning as “just the facts”, but it’s in fact intensely emotional. If students aren’t motivated, they can’t learn.

You must feel a certain amount of responsibility for the direction of the field of computer science, having authored a seminal textbook; published numerous research contributions in algorithms, parallel computing, design automation, computer architecture, and performance engineering; developed several new undergraduate classes; created many of the original modules for the UPOP program; lead the computer science program for Singapore-MIT Alliance; and created and developed a highly touted workshop on leadership skills for engineering and science faculty.

One of the things that has mattered most to me was the values that MIT, my department, and my laboratory all shared and embodied. My department places an equal emphasis on research and teaching; that notion, that each informs the other, matters to me. I’ve attended research presentations and thought, “Gosh, this is exactly the teaching example I’ve needed!” It goes the other way round, too. For example, my feeling is that the earlier in the curriculum you can teach a research result, the more important it is. So, if I can teach it to freshman, that is a more important result than being able to teach it only to grad students or peers. That was one measure, though it’s not the only measure of significance.

One of the good things about a university is that there are many paths to success, and one of the things you do is figure out which path is yours.  There’s a lot of room for individual people to contribute to academia in different ways. Some people are great cataloguers and scholars, others are incredible problem-solvers. Some people like textbooks. MIT supports that range of quirkiness among its faculty.

It’s like OpenCourseWare—one of the brilliant innovations at MIT with respect to teaching, which I’m proud to have had a stake in. I once sat next to a businessman on a flight between Europe and India who, as we were talking, showed no recognition of MIT, but he’d heard that an American university was putting all of its courses online for free. That was a good example of how our work was contributing to MIT’s mission and reputation. That’s how you attain excellence—by having people who contribute their reputations to the institution, rather than the other way around. MIT is great because of the people who are here. And I’m glad that Introduction to Algorithms contributes to MIT’s reputation.

Anatomy of a bestseller: Tom Cormen on keeping “Introduction to Algorithms” current

Tom Cormen, Emeritus Professor of Computer Science at Dartmouth College, is one of the co-authors of Introduction to Algorithms, the influential textbook now in its fourth edition. A perennial bestseller in the field, Introduction to Algorithms has maintained its standing as an important reference text over three decades—thirty years in which the world of computing has revolutionized nearly every aspect of daily life. We asked Tom about the work involved in keeping a textbook current in a changing field.

You were a graduate student when you first began work on Introduction to Algorithms, and you are now a professor emeritus. Tell me about the team you’ve assembled to help keep it current.

Over the years, several Dartmouth students have helped us prepare the textbook and ancillary materials, assisting in converting illustrations from one format to another or colorizing them, checking the bibliography, and developing lecture notes and solutions for the instructor’s manual. In fact, two undergraduate students, Clara Lee and Erica Lin, did such outstanding work on the manual for the second edition that they are listed as coauthors of the manual. I have managed to stay in touch with almost all of these students, some of who graduated from Dartmouth over 20 years ago.

Tell me about the co-authorship process: did you have a specialization, or a favorite part of the writing process?

Each chapter has a primary author, but we each have a hand in every chapter. In some cases, I developed a new way to approach existing material, such as changing the treatment of binary search trees between the second and third editions, or explaining the intuition behind the potential function for table doubling and halving in the fourth edition. In other cases, I had to learn the material in order to write the chapter from scratch, such as the chapter on bipartite matching and the section on suffix arrays in the fourth edition.

My favorite part of the writing process is the feeling I get when I know I got a sentence, or a paragraph, just right. And then I hope that I don’t get knocked down later when Julie Sussman, the best copyeditor ever, shows me an even better way to say it!

The field was obviously very different when you first wrote Introduction to Algorithms. Tell me a little about the process of revision within a quickly changing and developing field.

One change that is quite apparent to us, the authors, is the technology we use to produce the book. To produce a PDF of the book, we have to run it through LaTeX, BibTeX, and windex (the indexing program that Ron Rivest wrote) three times, dvips once to produce a PostScript file, and then ps2pdf to produce the PDF.

When we started writing the first edition, we were just producing PostScript, not PDF, and even before we had written the entire book, producing the PostScript file was an overnight run—unless we ran it on Ron’s machine, when it took about an hour. On our laptops now, it takes about 20 seconds to produce the PostScript file, and a few more seconds to produce the PDF.

For the first edition, the illustrations were produced at a separate site from the text, and they were pasted in. Starting with the second edition, we can lay in illustrations directly. We have also augmented our set of LaTeX macros over the years, most notably the clrscode, clrscode3e, and clrscode4e packages, which allow everyone to typeset pseudocode the way we do.

One other change related to keeping up with the times is that for the fourth edition, we are making available a complete set of Python implementations of the algorithms in the book. (Because the algorithms in the new chapter on machine learning are so abstract, we omitted them.) The Python code was written by my former student, Linda Xiao, and me.

Regarding how the content of the book has changed over the years, as much as we would have liked to only add material and not remove any, there are physical (and contractual) limitations to how big the book can get. So we’ve had to decide not only what material to add, but what material to cut. Left to our own devices, the book would end up being a cube!

Seeing into the future: Personalized cancer screening with artificial intelligence

While mammograms are currently the gold standard in breast cancer screening, swirls of controversy exist regarding when and how often they should be administered. On the one hand, advocates argue for the ability to save lives: Women aged 60-69 who receive mammograms, for example, have a 33 percent lower risk of dying compared to those who don’t get mammograms. Meanwhile, others argue about costly and potentially traumatic false positives: A meta-analysis of three randomized trials found a 19 percent over-diagnosis rate from mammography.

Even with some saved lives, and some overtreatment and overscreening, current guidelines are still a catch-all: Women aged 45 to 54 should get mammograms every year. While personalized screening has long been thought of as the answer, tools that can leverage the troves of data to do this lag behind. 

This led scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Jameel Clinic for Machine Learning and Health to ask: Can we use machine learning to provide personalized screening? 

Out of this came Tempo, a technology for creating risk-based screening guidelines. Using an AI-based risk model that looks at who was screened and when they got diagnosed, Tempo will recommend a patient return for a mammogram at a specific time point in the future, like six months or three years. The same Tempo policy can be easily adapted to a wide range of possible screening preferences, which would let clinicians pick their desired early-detection-to-screening-cost trade-off, without training new policies. 

The model was trained on a large screening mammography dataset from Massachusetts General Hospital (MGH), and was tested on held-out patients from MGH as well as external datasets from Emory, Karolinska Sweden, and Chang Gung Memorial hospitals. Using the team’s previously developed risk-assessment algorithm Mirai, Tempo obtained better early detection than annual screening while requiring 25 percent fewer mammograms overall at Karolinska. At MGH, it recommended roughly a mammogram a year, and obtained a simulated early detection benefit of roughly four-and-a-half months better. 

“By tailoring the screening to the patient’s individual risk, we can improve patient outcomes, reduce overtreatment, and eliminate health disparities,” says Adam Yala, a PhD student in electrical engineering and computer science, MIT CSAIL affiliate, and lead researcher on a paper describing Tempo published Jan. 13 in Nature Medicine. “Given the massive scale of breast cancer screening, with tens of millions of women getting mammograms every year, improvements to our guidelines are immensely important.”

Early uses of AI in medicine stem back to the 1960s, where many refer to the Dendral experiments as kicking off the field. Researchers created a software system that was considered the first expert kind that automated the decision-making and problem-solving behavior of organic chemists. Sixty years later, deep medicine has greatly evolved drug diagnostics, predictive medicine, and patient care. 

“Current guidelines divide the population into a few large groups, like younger or older than 55, and recommend the same screening frequency to all the members of a cohort. The development of AI-based risk models that operate over raw patient data give us an opportunity to transform screening, giving more frequent screens to those who need it and sparing the rest,” says Yala. “A key aspect of these models is that their predictions can evolve over time as a patient’s raw data changes, suggesting that screening policies need to be attuned to changes in risk and be optimized over long periods of patient data.” 

Tempo uses reinforcement learning, a machine learning method widely known for success in games like Chess and Go, to develop a “policy” that predicts a followup recommendation for each patient. 

The training data here only had information about a patient’s risk at the time points when their mammogram was taken (when they were 50, or 55, for example). The team needed the risk assessment at intermediate points, so they designed their algorithm to learn a patient’s risk at unobserved time points from their observed screenings, which evolved as new mammograms of the patient became available. 

The team first trained a neural network to predict future risk assessments given previous ones. This model then estimates patient risk at unobserved time points, and it enables simulation of the risk-based screening policies. Next, they trained that policy, (also a neural network), to maximize the reward (for example, the combination of early detection and screening cost) to the retrospective training set. Eventually, you’d get a recommendation for when to return for the next screen, ranging from six months to three years in the future, in multiples of six months — the standard is only one or two years. 

Let’s say Patient A comes in for their first mammogram, and eventually gets diagnosed at Year Four. In Year Two, there’s nothing, so they don’t come back for another two years, but then at Year Four they get a diagnosis. Now there’s been two years of gap between the last screen, where a tumor could have grown. 

Using Tempo, at that first mammogram, Year Zero, the recommendation might have been to come back in two years. And then at Year Two, it might have seen that risk is high, and recommended that the patient come back in six months, and in the best case, it would be detectable. The model is dynamically changing the patient’s screening frequency, based on how the risk profile is changing.

Tempo uses a simple metric for early detection, which assumes that cancer can be caught up to 18 months in advance. While Tempo outperformed current guidelines across different settings of this assumption (six months, 12 months), none of these assumptions are perfect, as the early detection potential of a tumor depends on that tumor’s characteristics. The team suggested that follow-up work using tumor growth models could address this issue. 

Also, the screening-cost metric, which counts the total screening volume recommended by Tempo, doesn’t provide a full analysis of the entire future cost because it does not explicitly quantify false positive risks or additional screening harms. 

There are many future directions that can further improve personalized screening algorithms. The team says one avenue would be to build on the metrics used to estimate early detection and screening costs from retrospective data, which would result in more refined guidelines. Tempo could also be adapted to include different types of screening recommendations, such as leveraging MRI or mammograms, and future work could separately model the costs and benefits of each. With better screening policies, recalculating the earliest and latest age that screening is still cost-effective for a patient might be feasible. 

“Our framework is flexible and can be readily utilized for other diseases, other forms of risk models, and other definitions of early detection benefit or screening cost. We expect the utility of Tempo to continue to improve as risk models and outcome metrics are further refined. We’re excited to work with hospital partners to prospectively study this technology and help us further improve personalized cancer screening,” says Yala. 

Yala wrote the paper on Tempo alongside MIT PhD student Peter G. Mikhael, Fredrik Strand of Karolinska University Hospital, Gigin Lin of Chang Gung Memorial Hospital, Yung-Liang Wan of Chang Gung University, Siddharth Satuluru of Emory University, Thomas Kim of Georgia Tech, Hari Trivedi of Emory University, Imon Banerjee of the Mayo Clinic, Judy Gichoya of the Emory University School of Medicine, Kevin Hughes of MGH, Constance Lehman of MGH, and senior author and MIT Professor Regina Barzilay.

The research is supported by grants from Susan G. Komen, Breast Cancer Research Foundation, Quanta Computing, an Anonymous Foundation, the MIT Jameel-Clinic, Chang Gung Medical Foundation Grant, and by Stockholm Läns Landsting HMT Grant.