Page 18 – MIT EECS

High-performance computing, with much less code

Posted on March 14, 2025 by Jane Halpern - News

Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial intelligence systems. NVIDIA, for instance, developed some of the most advanced high-performance computing (HPC) libraries, creating a competitive moat that has proven difficult for others to breach.

But what if a couple of students, within a few months, could compete with state-of-the-art HPC libraries with a few hundred lines of code, instead of tens or hundreds of thousands?

That’s what researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown with a new programming language called Exo 2.

Exo 2 belongs to a new category of programming languages that MIT Professor Jonathan Ragan-Kelley calls “user-schedulable languages” (USLs). Instead of hoping that an opaque compiler will auto-generate the fastest possible code, USLs put programmers in the driver’s seat, allowing them to write “schedules” that explicitly control how the compiler generates code. This enables performance engineers to transform simple programs that specify what they want to compute into complex programs that do the same thing as the original specification, but much, much faster.

One of the limitations of existing USLs (like the original Exo) is their relatively fixed set of scheduling operations, which makes it difficult to reuse scheduling code across different “kernels” (the individual components in a high-performance library).

In contrast, Exo 2 enables users to define new scheduling operations externally to the compiler, facilitating the creation of reusable scheduling libraries. Lead author Yuka Ikarashi, an MIT PhD student in electrical engineering and computer science and CSAIL affiliate, says that Exo 2 can reduce total schedule code by a factor of 100 and deliver performance competitive with state-of-the-art implementations on multiple different platforms, including Basic Linear Algebra Subprograms (BLAS) that power many machine learning applications. This makes it an attractive option for engineers in HPC focused on optimizing kernels across different operations, data types, and target architectures.

“It’s a bottom-up approach to automation, rather than doing an ML/AI search over high-performance code,” says Ikarashi. “What that means is that performance engineers and hardware implementers can write their own scheduling library, which is a set of optimization techniques to apply on their hardware to reach the peak performance.”

One major advantage of Exo 2 is that it reduces the amount of coding effort needed at any one time by reusing the scheduling code across applications and hardware targets. The researchers implemented a scheduling library with roughly 2,000 lines of code in Exo 2, encapsulating reusable optimizations that are linear-algebra specific and target-specific (AVX512, AVX2, Neon, and Gemmini hardware accelerators). This library consolidates scheduling efforts across more than 80 high-performance kernels with up to a dozen lines of code each, delivering performance comparable to, or better than, MKL, OpenBLAS, BLIS, and Halide.

Exo 2 includes a novel mechanism called “Cursors” that provides what they call a “stable reference” for pointing at the object code throughout the scheduling process. Ikarashi says that a stable reference is essential for users to encapsulate schedules within a library function, as it renders the scheduling code independent of object-code transformations.

“We believe that USLs should be designed to be user-extensible, rather than having a fixed set of operations,” says Ikarashi. “In this way, a language can grow to support large projects through the implementation of libraries that accommodate diverse optimization requirements and application domains.”

Exo 2’s design allows performance engineers to focus on high-level optimization strategies while ensuring that the underlying object code remains functionally equivalent through the use of safe primitives. In the future, the team hopes to expand Exo 2’s support for different types of hardware accelerators, like GPUs. Several ongoing projects aim to improve the compiler analysis itself, in terms of correctness, compilation time, and expressivity.

Ikarashi and Ragan-Kelley co-authored the paper with graduate students Kevin Qian and Samir Droubi, Alex Reinking of Adobe, and former CSAIL postdoc Gilbert Bernstein, now a professor at the University of Washington. This research was funded, in part, by the U.S. Defense Advanced Research Projects Agency (DARPA) and the U.S. National Science Foundation, while the first author was also supported by Masason, Funai, and Quad Fellowships.

QS World University Rankings rates MIT No. 1 in 11 subjects for 2025

Posted on March 13, 2025 by Jane Halpern - EECS Celebrates Awards, News

QS World University Rankings has placed MIT in the No. 1 spot in 11 subject areas for 2025, the organization announced today.

The Institute received a No. 1 ranking in the following QS subject areas: Chemical Engineering; Civil and Structural Engineering; Computer Science and Information Systems; Data Science and Artificial Intelligence; Electrical and Electronic Engineering; Linguistics; Materials Science; Mechanical, Aeronautical, and Manufacturing Engineering; Mathematics; Physics and Astronomy; and Statistics and Operational Research.

MIT also placed second in seven subject areas: Accounting and Finance; Architecture/Built Environment; Biological Sciences; Business and Management Studies; Chemistry; Earth and Marine Sciences; and Economics and Econometrics.

For 2024, universities were evaluated in 55 specific subjects and five broader subject areas. MIT was ranked No. 1 in the broader subject area of Engineering and Technology and No. 2 in Natural Sciences.

Quacquarelli Symonds Limited subject rankings, published annually, are designed to help prospective students find the leading schools in their field of interest. Rankings are based on research quality and accomplishments, academic reputation, and graduate employment.

MIT has been ranked as the No. 1 university in the world by QS World University Rankings for 13 straight years.

Robotic helper making mistakes? Just nudge it in the right direction

Posted on March 7, 2025 by Jane Halpern - News

Imagine that a robot is helping you clean the dishes. You ask it to grab a soapy bowl out of the sink, but its gripper slightly misses the mark.

Using a new framework developed by MIT and NVIDIA researchers, you could correct that robot’s behavior with simple interactions. The method would allow you to point to the bowl or trace a trajectory to it on a screen, or simply give the robot’s arm a nudge in the right direction.

Unlike other methods for correcting robot behavior, this technique does not require users to collect new data and retrain the machine-learning model that powers the robot’s brain. It enables a robot to use intuitive, real-time human feedback to choose a feasible action sequence that gets as close as possible to satisfying the user’s intent.

When the researchers tested their framework, its success rate was 21 percent higher than an alternative method that did not leverage human interventions.

In the long run, this framework could enable a user to more easily guide a factory-trained robot to perform a wide variety of household tasks even though the robot has never seen their home or the objects in it.

“We can’t expect laypeople to perform data collection and fine-tune a neural network model. The consumer will expect the robot to work right out of the box, and if it doesn’t, they would want an intuitive mechanism to customize it. That is the challenge we tackled in this work,” says Felix Yanwei Wang, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this method.

His co-authors include Lirui Wang PhD ’24 and Yilun Du PhD ’24; senior author Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL); as well as Balakumar Sundaralingam, Xuning Yang, Yu-Wei Chao, Claudia Perez-D’Arpino PhD ’19, and Dieter Fox of NVIDIA. The research will be presented at the International Conference on Robots and Automation.

Mitigating misalignment

Recently, researchers have begun using pre-trained generative AI models to learn a “policy,” or a set of rules, that a robot follows to complete an action. Generative models can solve multiple complex tasks.

During training, the model only sees feasible robot motions, so it learns to generate valid trajectories for the robot to follow.

While these trajectories are valid, that doesn’t mean they always align with a user’s intent in the real world. The robot might have been trained to grab boxes off a shelf without knocking them over, but it could fail to reach the box on top of someone’s bookshelf if the shelf is oriented differently than those it saw in training.

To overcome these failures, engineers typically collect data demonstrating the new task and re-train the generative model, a costly and time-consuming process that requires machine-learning expertise.

Instead, the MIT researchers wanted to allow users to steer the robot’s behavior during deployment when it makes a mistake.

But if a human interacts with the robot to correct its behavior, that could inadvertently cause the generative model to choose an invalid action. It might reach the box the user wants, but knock books off the shelf in the process.

“We want to allow the user to interact with the robot without introducing those kinds of mistakes, so we get a behavior that is much more aligned with user intent during deployment, but that is also valid and feasible,” Wang says.

Their framework accomplishes this by providing the user with three intuitive ways to correct the robot’s behavior, each of which offers certain advantages.

First, the user can point to the object they want the robot to manipulate in an interface that shows its camera view. Second, they can trace a trajectory in that interface, allowing them to specify how they want the robot to reach the object. Third, they can physically move the robot’s arm in the direction they want it to follow.

“When you are mapping a 2D image of the environment to actions in a 3D space, some information is lost. Physically nudging the robot is the most direct way to specifying user intent without losing any of the information,” says Wang.

Sampling for success

To ensure these interactions don’t cause the robot to choose an invalid action, such as colliding with other objects, the researchers use a specific sampling procedure. This technique lets the model choose an action from the set of valid actions that most closely aligns with the user’s goal.

“Rather than just imposing the user’s will, we give the robot an idea of what the user intends but let the sampling procedure oscillate around its own set of learned behaviors,” Wang explains.

This sampling method enabled the researchers’ framework to outperform the other methods they compared it to during simulations and experiments with a real robot arm in a toy kitchen.

While their method might not always complete the task right away, it offers users the advantage of being able to immediately correct the robot if they see it doing something wrong, rather than waiting for it to finish and then giving it new instructions.

Moreover, after a user nudges the robot a few times until it picks up the correct bowl, it could log that corrective action and incorporate it into its behavior through future training. Then, the next day, the robot could pick up the correct bowl without needing a nudge.

“But the key to that continuous improvement is having a way for the user to interact with the robot, which is what we have shown here,” Wang says.

In the future, the researchers want to boost the speed of the sampling procedure while maintaining or improving its performance. They also want to experiment with robot policy generation in novel environments.

Collaborating to advance research and innovation on essential chips for AI

Posted on March 4, 2025 by Jane Halpern - News

The following is a joint announcement from the MIT Microsystems Technology Laboratories and GlobalFoundries.

MIT and GlobalFoundries (GF), a leading manufacturer of essential semiconductors, have announced a new research agreement to jointly pursue advancements and innovations for enhancing the performance and efficiency of critical semiconductor technologies. The collaboration will be led by MIT’s Microsystems Technology Laboratories (MTL) and GF’s research and development team, GF Labs.

With an initial research focus on artificial intelligence and other applications, the first projects are expected to leverage GF’s differentiated silicon photonics technology, which monolithically integrates radio frequency silicon-on-insulator (RF SOI), CMOS (complementary metal-oxide semiconductor), and optical features on a single chip to realize power efficiencies for data centers, and GF’s 22FDX platform, which delivers ultra-low power consumption for intelligent devices at the edge.

“The collaboration between MIT MTL and GF exemplifies the power of academia-industry cooperation in tackling the most pressing challenges in semiconductor research,” says Tomás Palacios, MTL director and the Clarence J. LeBel Professor of Electrical Engineering and Computer Science. Palacios will serve as the MIT faculty lead for this research initiative.

“By bringing together MIT’s world-renowned capabilities with GF’s leading semiconductor platforms, we are positioned to drive significant research advancements in GF’s essential chip technologies for AI,” says Gregg Bartlett, chief technology officer at GF. “This collaboration underscores our commitment to innovation and highlights our dedication to developing the next generation of talent in the semiconductor industry. Together, we will research transformative solutions in the industry.”

“Integrated circuit technologies are the core driving a broad spectrum of applications ranging from mobile computing and communication devices to automotive, energy, and cloud computing,” says Anantha P. Chandrakasan, dean of MIT’s School of Engineering, chief innovation and strategy officer, and the Vannevar Bush Professor of Electrical Engineering and Computer Science. “This collaboration allows MIT’s exceptional research community to leverage GlobalFoundries’ wide range of industry domain experts and advanced process technologies to drive exciting innovations in microelectronics across domains — while preparing our students to take on leading roles in the workforce of the future.”

The new research agreement was formalized at a signing ceremony on campus at MIT. It builds upon GF’s successful past and ongoing engagements with the university. GF serves on MTL’s Microsystems Industrial Group, which brings together industry and academia to engage in research. MIT faculty are active participants in GF’s University Partnership Program focused on joint semiconductor research and prototyping. Additionally, GF and MIT collaborate on several workforce development initiatives, including through the Northeast Microelectronics Coalition, a U.S. Department of Defense Microelectronics Commons Hub.

MIT faculty, alumni named 2025 Sloan Research Fellows

Posted on February 21, 2025 by Jane Halpern - EECS Celebrates Awards, News

Seven MIT faculty and 21 additional MIT alumni are among 126 early-career researchers honored with 2025 Sloan Research Fellowships by the Alfred P. Sloan Foundation.

The recipients represent the MIT departments of Biology; Chemical Engineering; Chemistry; Civil and Environmental Engineering; Earth, Atmospheric and Planetary Sciences; Economics; Electrical Engineering and Computer Science; Mathematics; and Physics as well as the Music and Theater Arts Section and the MIT Sloan School of Management.

The fellowships honor exceptional researchers at U.S. and Canadian educational institutions, whose creativity, innovation, and research accomplishments make them stand out as the next generation of leaders. Winners receive a two-year, $75,000 fellowship that can be used flexibly to advance the fellow’s research.

“The Sloan Research Fellows represent the very best of early-career science, embodying the creativity, ambition, and rigor that drive discovery forward,” says Adam F. Falk, president of the Alfred P. Sloan Foundation. “These extraordinary scholars are already making significant contributions, and we are confident they will shape the future of their fields in remarkable ways.”

Including this year’s recipients, a total of 333 MIT faculty have received Sloan Research Fellowships since the program’s inception in 1955. MIT and Northwestern University are tied for having the most faculty in the 2025 cohort of fellows, each with seven. The MIT recipients are:

Ariel L. Furst is the Paul M. Cook Career Development Professor of Chemical Engineering at MIT. Her lab combines biological, chemical, and materials engineering to solve challenges in human health and environmental sustainability, with lab members developing technologies for implementation in low-resource settings to ensure equitable access to technology. Furst completed her PhD in the lab of Professor Jacqueline K. Barton at Caltech developing new cancer diagnostic strategies based on DNA charge transport. She was then an A.O. Beckman Postdoctoral Fellow in the lab of Professor Matthew Francis at the University of California at Berkeley, developing sensors to monitor environmental pollutants. She is the recipient of the NIH New Innovator Award, the NSF CAREER Award, and the Dreyfus Teacher-Scholar Award. She is passionate about STEM outreach and increasing participation of underrepresented groups in engineering.

Mohsen Ghaffari SM ’13, PhD ’17 is an associate professor in the Department of Electrical Engineering and Computer Science (EECS) as well as the Computer Science and Artificial Intelligence Laboratory (CSAIL). His research explores the theory of distributed and parallel computation, and he has had influential work on a range of algorithmic problems, including generic derandomization methods for distributed computing and parallel computing (which resolved several decades-old open problems), improved distributed algorithms for graph problems, sublinear algorithms derived via distributed techniques, and algorithmic and impossibility results for massively parallel computation. His work has been recognized with best paper awards at the IEEE Symposium on Foundations of Computer Science (FOCS), ACM-SIAM Symposium on Discrete Algorithms (SODA), ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), the ACM Symposium on Principles of Distributed Computing (PODC), and the International Symposium on Distributed Computing (DISC), the European Research Council’s Starting Grant, and a Google Faculty Research Award, among others.

Marzyeh Ghassemi PhD ’17 is an associate professor within EECS and the Institute for Medical Engineering and Science (IMES). Ghassemi earned two bachelor’s degrees in computer science and electrical engineering from New Mexico State University as a Goldwater Scholar; her MS in biomedical engineering from Oxford University as a Marshall Scholar; and her PhD in computer science from MIT. Following stints as a visiting researcher with Alphabet’s Verily and an assistant professor at University of Toronto, Ghassemi joined EECS and IMES as an assistant professor in July 2021. (IMES is the home of the Harvard-MIT Program in Health Sciences and Technology.) She is affiliated with the Laboratory for Information and Decision Systems (LIDS), the MIT-IBM Watson AI Lab, the Abdul Latif Jameel Clinic for Machine Learning in Health, the Institute for Data, Systems, and Society (IDSS), and CSAIL. Ghassemi’s research in the Healthy ML Group creates a rigorous quantitative framework in which to design, develop, and place machine learning models in a way that is robust and useful, focusing on health settings. Her contributions range from socially-aware model construction to improving subgroup- and shift-robust learning methods to identifying important insights in model deployment scenarios that have implications in policy, health practice, and equity. Among other awards, Ghassemi has been named one of MIT Technology Review’s 35 Innovators Under 35 and an AI2050 Fellow, as well as receiving the 2018 Seth J. Teller Award, the 2023 MIT Prize for Open Data, a 2024 NSF CAREER Award, and the Google Research Scholar Award. She founded the nonprofit Association for Health, Inference and Learning (AHLI) and her work has been featured in popular press such as Forbes, Fortune, MIT News, and The Huffington Post.

Darcy McRose is the Thomas D. and Virginia W. Cabot Career Development Assistant Professor of Civil and Environmental Engineering. She is an environmental microbiologist who draws on techniques from genetics, chemistry, and geosciences to understand the ways microbes control nutrient cycling and plant health. Her laboratory uses small molecules, or “secondary metabolites,” made by plants and microbes as tractable experiments tools to study microbial activity in complex environments like soils and sediments. In the long term, this work aims to uncover fundamental controls on microbial physiology and community assembly that can be used to promote agricultural sustainability, ecosystem health, and human prosperity.

Sarah Millholland, an assistant professor of physics at MIT and member of the Kavli Institute for Astrophysics and Space Research, is a theoretical astrophysicist who studies extrasolar planets, including their formation and evolution, orbital dynamics, and interiors/atmospheres. She studies patterns in the observed planetary orbital architectures, referring to properties like the spacings, eccentricities, inclinations, axial tilts, and planetary size relationships. She specializes in investigating how gravitational interactions such as tides, resonances, and spin dynamics sculpt observable exoplanet properties. She is the 2024 recipient of the Vera Rubin Early Career Award for her contributions to the formation and dynamics of extrasolar planetary systems. She plans to use her Sloan Fellowship to explore how tidal physics shape the diversity of orbits and interiors of exoplanets orbiting close to their stars.

Emil Verner is the Albert F. (1942) and Jeanne P. Clear Career Development Associate Professor of Global Management and an associate professor of finance at the MIT Sloan School of Management. His research lies at the intersection of finance and macroeconomics, with a particular focus on understanding the causes and consequences of financial crises over the past 150 years. Verner’s recent work examines the drivers of bank runs and insolvency during banking crises, the role of debt booms in amplifying macroeconomic fluctuations, the effectiveness of debt relief policies during crises, and how financial crises impact political polarization and support for populist parties. Before joining MIT, he earned a PhD in economics from Princeton University.

Christian Wolf, the Rudi Dornbusch Career Development Assistant Professor of Economics and a faculty research fellow at the National Bureau of Economic Research, works in macroeconomics, monetary economics, and time series econometrics. His work focuses on the development and application of new empirical methods to address classic macroeconomic questions and to evaluate how robust the answers are to a range of common modeling assumptions. His research has provided path-breaking insights on monetary transmission mechanisms and fiscal policy. In a separate strand of work, Wolf has substantially deepened our understanding of the appropriate methods macroeconomists should use to estimate impulse response functions — how key economic variables respond to policy changes or unexpected shocks.

The following MIT alumni also received fellowships:

Jason Altschuler SM ’18, PhD ’22
David Bau III PhD ’21
Rene Boiteau PhD ’16
Lynne Chantranupong PhD ’17
Lydia B. Chilton ’06, ’07, MNG ’09
Jordan Cotler ’15
Alexander Ji PhD ’17
Sarah B. King ’10
Allison Z. Koenecke ’14
Eric Larson PhD ’18
Chen Lian ’15, PhD ’20
Huanqian Loh ’06
Ian J. Moult PhD ’16
Lisa Olshansky PhD ’15
Andrew Owens SM ’13, PhD ’16
Matthew Rognlie PhD ’16
David Rolnick ’12, PhD ’18
Shreya Saxena PhD ’17
Mark Sellke ’18
Amy X. Zhang PhD ’19
Aleksandr V. Zhukhovitskiy PhD ’16

Like human brains, large language models reason about diverse data in a general way

Posted on February 20, 2025 by Jane Halpern - News

While early language models could only process text, contemporary large language models now perform highly diverse tasks on different types of data. For instance, LLMs can understand many languages, generate computer code, solve math problems, or answer questions about images and audio.

MIT researchers probed the inner workings of LLMs to better understand how they process such assorted data, and found evidence that they share some similarities with the human brain.

Neuroscientists believe the human brain has a “semantic hub” in the anterior temporal lobe that integrates semantic information from various modalities, like visual data and tactile inputs. This semantic hub is connected to modality-specific “spokes” that route information to the hub. The MIT researchers found that LLMs use a similar mechanism by abstractly processing data from diverse modalities in a central, generalized way. For instance, a model that has English as its dominant language would rely on English as a central medium to process inputs in Japanese or reason about arithmetic, computer code, etc. Furthermore, the researchers demonstrate that they can intervene in a model’s semantic hub by using text in the model’s dominant language to change its outputs, even when the model is processing data in other languages.

These findings could help scientists train future LLMs that are better able to handle diverse data.

“LLMs are big black boxes. They have achieved very impressive performance, but we have very little knowledge about their internal working mechanisms. I hope this can be an early step to better understand how they work so we can improve upon them and better control them when needed,” says Zhaofeng Wu, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this research.

His co-authors include Xinyan Velocity Yu, a graduate student at the University of Southern California (USC); Dani Yogatama, an associate professor at USC; Jiasen Lu, a research scientist at Apple; and senior author Yoon Kim, an assistant professor of EECS at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference on Learning Representations.

Integrating diverse data

The researchers based the new study upon prior work which hinted that English-centric LLMs use English to perform reasoning processes on various languages.

Wu and his collaborators expanded this idea, launching an in-depth study into the mechanisms LLMs use to process diverse data.

An LLM, which is composed of many interconnected layers, splits input text into words or sub-words called tokens. The model assigns a representation to each token, which enables it to explore the relationships between tokens and generate the next word in a sequence. In the case of images or audio, these tokens correspond to particular regions of an image or sections of an audio clip.

The researchers found that the model’s initial layers process data in its specific language or modality, like the modality-specific spokes in the human brain. Then, the LLM converts tokens into modality-agnostic representations as it reasons about them throughout its internal layers, akin to how the brain’s semantic hub integrates diverse information.

The model assigns similar representations to inputs with similar meanings, despite their data type, including images, audio, computer code, and arithmetic problems. Even though an image and its text caption are distinct data types, because they share the same meaning, the LLM would assign them similar representations.

For instance, an English-dominant LLM “thinks” about a Chinese-text input in English before generating an output in Chinese. The model has a similar reasoning tendency for non-text inputs like computer code, math problems, or even multimodal data.

To test this hypothesis, the researchers passed a pair of sentences with the same meaning but written in two different languages through the model. They measured how similar the model’s representations were for each sentence.

Then they conducted a second set of experiments where they fed an English-dominant model text in a different language, like Chinese, and measured how similar its internal representation was to English versus Chinese. The researchers conducted similar experiments for other data types.

They consistently found that the model’s representations were similar for sentences with similar meanings. In addition, across many data types, the tokens the model processed in its internal layers were more like English-centric tokens than the input data type.

“A lot of these input data types seem extremely different from language, so we were very surprised that we can probe out English-tokens when the model processes, for example, mathematic or coding expressions,” Wu says.

Leveraging the semantic hub

The researchers think LLMs may learn this semantic hub strategy during training because it is an economical way to process varied data.

“There are thousands of languages out there, but a lot of the knowledge is shared, like commonsense knowledge or factual knowledge. The model doesn’t need to duplicate that knowledge across languages,” Wu says.

The researchers also tried intervening in the model’s internal layers using English text when it was processing other languages. They found that they could predictably change the model outputs, even though those outputs were in other languages.

Scientists could leverage this phenomenon to encourage the model to share as much information as possible across diverse data types, potentially boosting efficiency.

But on the other hand, there could be concepts or knowledge that are not translatable across languages or data types, like culturally specific knowledge. Scientists might want LLMs to have some language-specific processing mechanisms in those cases.

“How do you maximally share whenever possible but also allow languages to have some language-specific processing mechanisms? That could be explored in future work on model architectures,” Wu says.

In addition, researchers could use these insights to improve multilingual models. Often, an English-dominant model that learns to speak another language will lose some of its accuracy in English. A better understanding of an LLM’s semantic hub could help researchers prevent this language interference, he says.

“Understanding how language models process inputs across languages and modalities is a key question in artificial intelligence. This paper makes an interesting connection to neuroscience and shows that the proposed ‘semantic hub hypothesis’ holds in modern language models, where semantically similar representations of different data types are created in the model’s intermediate layers,” says Mor Geva Pipek, an assistant professor in the School of Computer Science at Tel Aviv University, who was not involved with this work. “The hypothesis and experiments nicely tie and extend findings from previous works and could be influential for future research on creating better multimodal models and studying links between them and brain function and cognition in humans.”

This research is funded, in part, by the MIT-IBM Watson AI Lab.

Chip-based system for terahertz waves could enable more efficient, sensitive electronics

Posted on February 20, 2025 by Jane Halpern - News

The use of terahertz waves, which have shorter wavelengths and higher frequencies than radio waves, could enable faster data transmission, more precise medical imaging, and higher-resolution radar.

But effectively generating terahertz waves using a semiconductor chip, which is essential for incorporation into electronic devices, is notoriously difficult.

Many current techniques can’t generate waves with enough radiating power for useful applications unless they utilize bulky and expensive silicon lenses. Higher radiating power allows terahertz signals to travel farther. Such lenses, which are often larger than the chip itself, make it hard to integrate the terahertz source into an electronic device.

To overcome these limitations, MIT researchers developed a terahertz amplifier-multiplier system that achieves higher radiating power than existing devices without the need for silicon lenses.

By affixing a thin, patterned sheet of material to the back of the chip and utilizing higher-power Intel transistors, the researchers produced a more efficient, yet scalable, chip-based terahertz wave generator.

This compact chip could be used to make terahertz arrays for applications like improved security scanners for detecting hidden objects or environmental monitors for pinpointing airborne pollutants.

“To take full advantage of a terahertz wave source, we need it to be scalable. A terahertz array might have hundreds of chips, and there is no place to put silicon lenses because the chips are combined with such high density. We need a different package, and here we’ve demonstrated a promising approach that can be used for scalable, low-cost terahertz arrays,” says Jinchen Wang, a graduate student in the Department of Electrical Engineering and Computer Science (EECS) and lead author of a paper on the terahertz radiator.

He is joined on the paper by EECS graduate students Daniel Sheen and Xibi Chen; Steven F. Nagle, managing director of the T.J. Rodgers RLE Laboratory; and senior author Ruonan Han, an associate professor in EECS, who leads the Terahertz Integrated Electronics Group. The research will be presented at the IEEE International Solid-States Circuits Conference.

Making waves

Terahertz waves sit on the electromagnetic spectrum between radio waves and infrared light. Their higher frequencies enable them to carry more information per second than radio waves, while they can safely penetrate a wider range of materials than infrared light.

One way to generate terahertz waves is with a CMOS chip-based amplifier-multiplier chain that increases the frequency of radio waves until they reach the terahertz range. To achieve the best performance, waves go through the silicon chip and are eventually emitted out the back into the open air.

But a property known as the dielectric constant gets in the way of a smooth transmission.

The dielectric constant influences how electromagnetic waves interact with a material. It affects the amount of radiation that is absorbed, reflected, or transmitted. Because the dielectric constant of silicon is much higher than that of air, most terahertz waves are reflected at the silicon-air boundary rather than being cleanly transmitted out the back.

Since most signal strength is lost at this boundary, current approaches often use silicon lenses to boost the power of the remaining signal.

The MIT researchers approached this problem differently.

They drew on an electromechanical theory known as matching. With matching, they seek to equal out the dielectric constants of silicon and air, which will minimize the amount of signal that is reflected at the boundary.

They accomplish this by sticking a thin sheet of material which has a dielectric constant between silicon and air to the back of the chip. With this matching sheet in place, most waves will be transmitted out the back rather than being reflected.

A scalable approach

They chose a low-cost, commercially available substrate material with a dielectric constant very close to what they needed for matching. To improve performance, they used a laser cutter to punch tiny holes into the sheet until its dielectric constant was exactly right.

“Since the dielectric constant of air is 1, if you just cut some subwavelength holes in the sheet, it is equivalent to injecting some air, which lowers the overall dielectric constant of the matching sheet,” Wang explains.

In addition, they designed their chip with special transistors developed by Intel that have a higher maximum frequency and breakdown voltage than traditional CMOS transistors.

“These two things taken together, the more powerful transistors and the dielectric sheet, plus a few other small innovations, enabled us to outperform several other devices,” he says.

Their chip generated terahertz signals with a peak radiation power of 11.1 decibel-milliwatts, the best among state-of-the-art techniques. Moreover, since the low-cost chip can be fabricated at scale, it could be integrated into real-world electronic devices more readily.

One of the biggest challenges of developing a scalable chip was determining how to manage the power and temperature when generating terahertz waves.

“Because the frequency and the power are so high, many of the standard ways to design a CMOS chip are not applicable here,” Wang says.

The researchers also needed to devise a technique for installing the matching sheet that could be scaled up in a manufacturing facility.

Moving forward, they want to demonstrate this scalability by fabricating a phased array of CMOS terahertz sources, enabling them to steer and focus a powerful terahertz beam with a low-cost, compact device.

This research is supported, in part, by NASA’s Jet Propulsion Laboratory and Strategic University Research Partnerships Program, as well as the MIT Center for Integrated Circuits and Systems. The chip was fabricated through the Intel University Shuttle Program.

Tracking gene expression changes through cell lineage progression with PORCELAN

Posted on February 19, 2025 by Jane Halpern - News

Recent advances in barcoding technologies have made it possible to reconstruct a lineage tree of cells while simultaneously capturing their transcriptomic profiles. However, to fully leverage the resolution provided by these lineage-resolved single-cell RNA sequencing (scRNA-seq) datasets, new computational approaches are needed. These methods must address key challenges, such as ensuring that gene expression analysis goes beyond pairwise comparisons between stages and instead captures the full hierarchical structure of lineage trees, allowing for the detection of gene expression patterns that follow or deviate from lineage relationships.

In a new study published in Nature Communications, researchers at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard introduce PORCELAN, a statistical framework that automatically detects gene expression patterns linked to lineage progression. This method provides a systematic way to study how gene expression and cell state memory evolve through cell divisions, offering new insights into processes such as cancer progression.

‍Decoding Gene Expression Through Lineage Trees‍

PORCELAN – short for Permutation, Optimization, and Representation learning-based single Cell gene Expression and Lineage ANalysis – combines representation learning with permutations among leaves in the lineage tree. Using a statistical approach, PORCELAN addresses three questions: How can we jointly capture lineage and gene expression information in cell representations? Which genes best reflect lineage relationships, and in which subtrees is this connection strongest? To what extent does gene expression preserve lineage tree structure across different resolutions?

Schmidt Center graduate student and first author Hannah Schlueter aimed to create a rigorous and adaptable tool for studying cellular identity.

The researchers validated PORCELAN using synthetic datasets and applied it to three biological systems with lineage-traced scRNA-seq data: lung cancer progression, mouse embryogenesis, and C. elegans development. In lung cancer, PORCELAN identified tumor cell subpopulations that contributed to metastases and pinpointed key genes associated with these transitions – many of which align with known cancer biomarkers and pathways. In developmental systems, the framework uncovered differences in how gene expression memory is maintained across cell divisions, highlighting contrasts between normal development and cancerous progression. These findings underscore the importance of lineage-resolved approaches in understanding fundamental biological processes.

A Flexible Tool for the Future‍

The study was led by Hannah Schlueter, a Schmidt Center graduate student and PhD EECS student at MIT’s Laboratory for Information & Decision Systems (LIDS), in collaboration with corresponding author Caroline Uhler, Director of the Schmidt Center and Andrew (1956) and Erna Viterbi Professor of Engineering at MIT in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS).

“Our goal was to develop a method that is both rigorous and adaptable,” says Schlueter. “Because PORCELAN is modular, it can be applied to different data modalities, including lineage-resolved imaging data, by replacing the simpler tree-likeness score based on local autocorrelation, used for transcriptomic data, with a representation learning-based tree-likeness score. This flexibility makes it a powerful tool for studying how cellular identity is maintained and altered over time.”

As lineage tracing technologies continue to evolve, methods like PORCELAN highlight the critical role of applying statistical techniques to biological research. This approach, which merges computational tools with biological insights, is central to the work at the Schmidt Center. By developing methods that bridge computational models with biological questions, the Schmidt Center aims to drive discoveries that deepen our understanding of cellular biology, disease mechanisms, and potential therapeutic strategies.

Climate change and machine learning — the good, bad, and unknown

Posted on February 19, 2025 by Jane Halpern - News

Machine learning and climate change have a complicated relationship: Machine learning can enable climate-friendly actions, but it can also hurt sustainability goals, given its large demand on energy resources and its role in climate-adverse business models and trends.

Organizations need to continuously push the boundaries of diverse machine learning technologies to meet climate change challenges while considering their energy costs, according to MIT professor Priya Donti. Speaking at the 2024 MIT Sustainability Conference, Donti said that organizations must also remain pragmatic about how machine learning can upset efforts or create uncertainty around meeting broader societal sustainability goals.

“There are lots of subtle but transformative effects machine learning has that we should be paying attention to in the context of climate,” said Donti, a co-founder and chair of Climate Change AI, a global nonprofit focused on the intersection of climate change and machine learning.

The upside: machine learning advances sustainability goals

Machine learning can help with climate solutions across several fronts, including improving the efficiency of electricity systems and smart buildings and accelerating climate science research. Donti and her coauthors highlighted these innovations in a 2022 paper that details how machine learning applications can be applied to climate change in several broad categories:

Turning raw data into actionable insights. It’s not always possible to gather on-the-ground data at the scale necessary to understand greenhouse gas emissions — for example, when collecting information from areas where deforestation is happening or capturing the energy efficiency characteristics of smart buildings throughout a city. Combining satellite and aerial imagery with machine learning can provide insights that can be extrapolated to a broader scale. For example, a coalition of nonprofits called Climate Trace uses a combination of satellite imagery and on-the-ground data to gather independent emissions inventories for different sectors.

Improved forecasting. Machine learning systems can process data to uncover relationships among variables for improved forecasting. For example, historical data about weather and solar power production can be used to forecast what solar power production might look like in the near future. This could facilitate better power grid management.

Automated decision-making. Machine learning programs can analyze real-time information to automatically calibrate the temperature of buildings, data centers, or refrigeration environments efficiently.

Predictive maintenance. Asset downtime is detrimental to business operations, so the ability to identify and address potential problems before they occur is a huge efficiency advantage. For example, German rail company Deutsche Bahn is using machine learning to identify faults in the railway switching infrastructure, which enables it to make proactive fixes that keep the trains running on time, Donti said. Another example is natural gas providers that are using sensor data and aerial and satellite imagery to detect anomalies that predict methane leaks before they occur.

Science and engineering discovery. Machine learning can analyze the outcomes of experiments to accelerate discovery in areas like synthesizing molecules or carbon dioxide sorbents. Machine learning can also help approximate time-intensive simulations, which enables faster overall model performance and higher-resolution outputs.

Data management for climate change workflows. Data is key to the modeling process. Machine learning can help organizations match and merge datasets from diverse sources.

Given the heterogeneity of climate change challenges, these diverse approaches are needed. “We need to make sure we’re fostering a diverse ecosystem that can meet this set of challenges, rather than conflating one particular set of AI techniques with one particular AI paradigm,” Donti said.

The bad: machine learning uses a lot of energy

Machine learning has a significant impact on energy and water resources, given the computational horsepower required for processing and training large models and the water needed to cool data centers, Donti said. Hardware use consumes energy, and producing, transporting, and disposing of hardware also creates carbon emissions. Research shows that data centers and information and communications technology accounted for 1% to 2% of greenhouse gas emissions in 2020. Though it isn’t known how much AI contributes to those emissions, energy use related to developing, training, and running machine learning models is undoubtedly rising, Donti said.

Moreover, research shows that newer technologies are inherently more carbon-intensive. For example, the carbon footprint for model training and model execution has historically been split 50/50, Donti said. Using large language models has been found to be more carbon-intensive than training them. Similarly, task-specific models have given way to multipurpose models that are larger and more compute-intensive. Choosing the wrong approach or model can significantly increase energy use.

Creating a greener power grid with renewables and energy workload management is essential but still insufficient to fully address these issues. “Understanding these dynamics and trends is key to understanding what to do about it,” Donti said. “This is where increased transparency and data collection becomes exceedingly important.”

The unknown: whether machine learning facilitates climate-adverse technologies

Widespread machine learning use also has nuanced climate implications. For instance, machine learning is being used to accelerate production levels for the oil and gas industry, a hugely carbon-intensive sector, Donti said. Similarly, machine learning paired with technologies such as internet-of-things devices can help farmers manage larger groups of livestock. While such innovations can increase profits and productivity for their users, their potential for boosting carbon emissions is significant.

Autonomous vehicles, another promising innovation that relies on machine learning, could also have a negative impact on climate action goals, Donti said. While individual self-driving vehicles are more energy efficient than most vehicles on the road today, they may entice people to drive more, entrenching privatized transportation and slowing a transition to a more multimodal transportation model.

Another area where machine learning innovations are in potential conflict with climate action is personalized advertising that encourages emissions-intense consumption and amplifies polarized views.

Donti made the following recommendations for countering those effects:

Invest in heterogeneous implementations of AI and machine learning so the organization isn’t limited to one approach or constrained by a single vendor’s strategies.
Commit to purposeful work on applications that have proved to do good.
Adopt practices that reduce Scope 1, 2, and 3 emissions.
Communicate the benefits and risks of AI in an appropriately nuanced and grounded way as opposed to engaging in hype and overpromising results.

“As we move AI forward, we must actively account for both AI’s and machine learning’s direct impact and the implications for different applications,” Donti said. “We must use it for good applications and for business as usual.”

AI model deciphers the code in proteins that tells them where to go

Posted on February 19, 2025 by Jane Halpern - News

Proteins are the workhorses that keep our cells running, and there are many thousands of types of proteins in our cells, each performing a specialized function. Researchers have long known that the structure of a protein determines what it can do. More recently, researchers are coming to appreciate that a protein’s localization is also critical for its function. Cells are full of compartments that help to organize their many denizens. Along with the well-known organelles that adorn the pages of biology textbooks, these spaces also include a variety of dynamic, membrane-less compartments that concentrate certain molecules together to perform shared functions. Knowing where a given protein localizes, and who it co-localizes with, can therefore be useful for better understanding that protein and its role in the healthy or diseased cell, but researchers have lacked a systematic way to predict this information.

Meanwhile, protein structure has been studied for over half-a-century, culminating in the artificial intelligence tool AlphaFold, which can predict protein structure from a protein’s amino acid code, the linear string of building blocks within it that folds to create its structure. AlphaFold and models like it have become widely used tools in research.

Proteins also contain regions of amino acids that do not fold into a fixed structure, but are instead important for helping proteins join dynamic compartments in the cell. MIT Professor Richard Young and colleagues wondered whether the code in those regions could be used to predict protein localization in the same way that other regions are used to predict structure. Other researchers have discovered some protein sequences that code for protein localization, and some have begun developing predictive models for protein localization. However, researchers did not know whether a protein’s localization to any dynamic compartment could be predicted based on its sequence, nor did they have a comparable tool to AlphaFold for predicting localization.

Now, Young, also member of the Whitehead Institute for Biological Research; Young lab postdoc Henry Kilgore; Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in MIT’s Department of Electrical Engineering and Computer Science and principal investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL); and colleagues have built such a model, which they call ProtGPS. In a paper published on Feb. 6 in the journal Science, with first authors Kilgore and Barzilay lab graduate students Itamar Chinn, Peter Mikhael, and Ilan Mitnikov, the cross-disciplinary team debuts their model. The researchers show that ProtGPS can predict to which of 12 known types of compartments a protein will localize, as well as whether a disease-associated mutation will change that localization. Additionally, the research team developed a generative algorithm that can design novel proteins to localize to specific compartments.

“My hope is that this is a first step towards a powerful platform that enables people studying proteins to do their research,” Young says, “and that it helps us understand how humans develop into the complex organisms that they are, how mutations disrupt those natural processes, and how to generate therapeutic hypotheses and design drugs to treat dysfunction in a cell.”

The researchers also validated many of the model’s predictions with experimental tests in cells.

“It really excited me to be able to go from computational design all the way to trying these things in the lab,” Barzilay says. “There are a lot of exciting papers in this area of AI, but 99.9 percent of those never get tested in real systems. Thanks to our collaboration with the Young lab, we were able to test, and really learn how well our algorithm is doing.”

Developing the model

The researchers trained and tested ProtGPS on two batches of proteins with known localizations. They found that it could correctly predict where proteins end up with high accuracy. The researchers also tested how well ProtGPS could predict changes in protein localization based on disease-associated mutations within a protein. Many mutations — changes to the sequence for a gene and its corresponding protein — have been found to contribute to or cause disease based on association studies, but the ways in which the mutations lead to disease symptoms remain unknown.

Figuring out the mechanism for how a mutation contributes to disease is important because then researchers can develop therapies to fix that mechanism, preventing or treating the disease. Young and colleagues suspected that many disease-associated mutations might contribute to disease by changing protein localization. For example, a mutation could make a protein unable to join a compartment containing essential partners.

They tested this hypothesis by feeding ProtGOS more than 200,000 proteins with disease-associated mutations, and then asking it to both predict where those mutated proteins would localize and measure how much its prediction changed for a given protein from the normal to the mutated version. A large shift in the prediction indicates a likely change in localization.

The researchers found many cases in which a disease-associated mutation appeared to change a protein’s localization. They tested 20 examples in cells, using fluorescence to compare where in the cell a normal protein and the mutated version of it ended up. The experiments confirmed ProtGPS’s predictions. Altogether, the findings support the researchers’ suspicion that mis-localization may be an underappreciated mechanism of disease, and demonstrate the value of ProtGPS as a tool for understanding disease and identifying new therapeutic avenues.

“The cell is such a complicated system, with so many components and complex networks of interactions,” Mitnikov says. “It’s super interesting to think that with this approach, we can perturb the system, see the outcome of that, and so drive discovery of mechanisms in the cell, or even develop therapeutics based on that.”

The researchers hope that others begin using ProtGPS in the same way that they use predictive structural models like AlphaFold, advancing various projects on protein function, dysfunction, and disease.

Moving beyond prediction to novel generation

The researchers were excited about the possible uses of their prediction model, but they also wanted their model to go beyond predicting localizations of existing proteins, and allow them to design completely new proteins. The goal was for the model to make up entirely new amino acid sequences that, when formed in a cell, would localize to a desired location. Generating a novel protein that can actually accomplish a function — in this case, the function of localizing to a specific cellular compartment — is incredibly difficult. In order to improve their model’s chances of success, the researchers constrained their algorithm to only design proteins like those found in nature. This is an approach commonly used in drug design, for logical reasons; nature has had billions of years to figure out which protein sequences work well and which do not.

Because of the collaboration with the Young lab, the machine learning team was able to test whether their protein generator worked. The model had good results. In one round, it generated 10 proteins intended to localize to the nucleolus. When the researchers tested these proteins in the cell, they found that four of them strongly localized to the nucleolus, and others may have had slight biases toward that location as well.

“The collaboration between our labs has been so generative for all of us,” Mikhael says. “We’ve learned how to speak each other’s languages, in our case learned a lot about how cells work, and by having the chance to experimentally test our model, we’ve been able to figure out what we need to do to actually make the model work, and then make it work better.”

Being able to generate functional proteins in this way could improve researchers’ ability to develop therapies. For example, if a drug must interact with a target that localizes within a certain compartment, then researchers could use this model to design a drug to also localize there. This should make the drug more effective and decrease side effects, since the drug will spend more time engaging with its target and less time interacting with other molecules, causing off-target effects.

The machine learning team members are enthused about the prospect of using what they have learned from this collaboration to design novel proteins with other functions beyond localization, which would expand the possibilities for therapeutic design and other applications.

“A lot of papers show they can design a protein that can be expressed in a cell, but not that the protein has a particular function,” Chinn says. “We actually had functional protein design, and a relatively huge success rate compared to other generative models. That’s really exciting to us, and something we would like to build on.”

All of the researchers involved see ProtGPS as an exciting beginning. They anticipate that their tool will be used to learn more about the roles of localization in protein function and mis-localization in disease. In addition, they are interested in expanding the model’s localization predictions to include more types of compartments, testing more therapeutic hypotheses, and designing increasingly functional proteins for therapies or other applications.

“Now that we know that this protein code for localization exists, and that machine learning models can make sense of that code and even create functional proteins using its logic, that opens up the door for so many potential studies and applications,” Kilgore says.