A new way to edit or generate images

AI image generation — which relies on neural networks to create new images from a variety of inputs, including text prompts — is projected to become a billion-dollar industry by the end of this decade. Even with today’s technology, if you wanted to make a fanciful picture of, say, a friend planting a flag on Mars or heedlessly flying into a black hole, it could take less than a second. However, before they can perform tasks like that, image generators are commonly trained on massive datasets containing millions of images that are often paired with associated text. Training these generative models can be an arduous chore that takes weeks or months, consuming vast computational resources in the process.

But what if it were possible to generate images through AI methods without using a generator at all? That real possibility, along with other intriguing ideas, was described in a research paper presented at the International Conference on Machine Learning (ICML 2025), which was held in Vancouver, British Columbia, earlier this summer. The paper, describing novel techniques for manipulating and generating images, was written by Lukas Lao Beyer, a graduate student researcher in MIT’s Laboratory for Information and Decision Systems (LIDS); Tianhong Li, a postdoc at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL); Xinlei Chen of Facebook AI Research; Sertac Karaman, an MIT professor of aeronautics and astronautics and the director of LIDS; and Kaiming He, an MIT associate professor of electrical engineering and computer science.

This group effort had its origins in a class project for a graduate seminar on deep generative models that Lao Beyer took last fall. In conversations during the semester, it became apparent to both Lao Beyer and He, who taught the seminar, that this research had real potential, which went far beyond the confines of a typical homework assignment. Other collaborators were soon brought into the endeavor.

The starting point for Lao Beyer’s inquiry was a June 2024 paper, written by researchers from the Technical University of Munich and the Chinese company ByteDance, which introduced a new way of representing visual information called a one-dimensional tokenizer. With this device, which is also a kind of neural network, a 256×256-pixel image can be translated into a sequence of just 32 numbers, called tokens. “I wanted to understand how such a high level of compression could be achieved, and what the tokens themselves actually represented,” says Lao Beyer.

The previous generation of tokenizers would typically break up the same image into an array of 16×16 tokens — with each token encapsulating information, in highly condensed form, that corresponds to a specific portion of the original image. The new 1D tokenizers can encode an image more efficiently, using far fewer tokens overall, and these tokens are able to capture information about the entire image, not just a single quadrant. Each of these tokens, moreover, is a 12-digit number consisting of 1s and 0s, allowing for 212 (or about 4,000) possibilities altogether. “It’s like a vocabulary of 4,000 words that makes up an abstract, hidden language spoken by the computer,” He explains. “It’s not like a human language, but we can still try to find out what it means.”

That’s exactly what Lao Beyer had initially set out to explore — work that provided the seed for the ICML 2025 paper. The approach he took was pretty straightforward. If you want to find out what a particular token does, Lao Beyer says, “you can just take it out, swap in some random value, and see if there is a recognizable change in the output.” Replacing one token, he found, changes the image quality, turning a low-resolution image into a high-resolution image or vice versa. Another token affected the blurriness in the background, while another still influenced the brightness. He also found a token that’s related to the “pose,” meaning that, in the image of a robin, for instance, the bird’s head might shift from right to left.

“This was a never-before-seen result, as no one had observed visually identifiable changes from manipulating tokens,” Lao Beyer says. The finding raised the possibility of a new approach to editing images. And the MIT group has shown, in fact, how this process can be streamlined and automated, so that tokens don’t have to be modified by hand, one at a time.

He and his colleagues achieved an even more consequential result involving image generation. A system capable of generating images normally requires a tokenizer, which compresses and encodes visual data, along with a generator that can combine and arrange these compact representations in order to create novel images. The MIT researchers found a way to create images without using a generator at all. Their new approach makes use of a 1D tokenizer and a so-called detokenizer (also known as a decoder), which can reconstruct an image from a string of tokens. However, with guidance provided by an off-the-shelf neural network called CLIP — which cannot generate images on its own, but can measure how well a given image matches a certain text prompt — the team was able to convert an image of a red panda, for example, into a tiger. In addition, they could create images of a tiger, or any other desired form, starting completely from scratch — from a situation in which all the tokens are initially assigned random values (and then iteratively tweaked so that the reconstructed image increasingly matches the desired text prompt).

The group demonstrated that with this same setup — relying on a tokenizer and detokenizer, but no generator — they could also do “inpainting,” which means filling in parts of images that had somehow been blotted out. Avoiding the use of a generator for certain tasks could lead to a significant reduction in computational costs because generators, as mentioned, normally require extensive training.

What might seem odd about this team’s contributions, He explains, “is that we didn’t invent anything new. We didn’t invent a 1D tokenizer, and we didn’t invent the CLIP model, either. But we did discover that new capabilities can arise when you put all these pieces together.”

“This work redefines the role of tokenizers,” comments Saining Xie, a computer scientist at New York University. “It shows that image tokenizers — tools usually used just to compress images — can actually do a lot more. The fact that a simple (but highly compressed) 1D tokenizer can handle tasks like inpainting or text-guided editing, without needing to train a full-blown generative model, is pretty surprising.”

Zhuang Liu of Princeton University agrees, saying that the work of the MIT group “shows that we can generate and manipulate the images in a way that is much easier than we previously thought. Basically, it demonstrates that image generation can be a byproduct of a very effective image compressor, potentially reducing the cost of generating images several-fold.”

There could be many applications outside the field of computer vision, Karaman suggests. “For instance, we could consider tokenizing the actions of robots or self-driving cars in the same way, which may rapidly broaden the impact of this work.”

Lao Beyer is thinking along similar lines, noting that the extreme amount of compression afforded by 1D tokenizers allows you to do “some amazing things,” which could be applied to other fields. For example, in the area of self-driving cars, which is one of his research interests, the tokens could represent, instead of images, the different routes that a vehicle might take.

Xie is also intrigued by the applications that may come from these innovative ideas. “There are some really cool use cases this could unlock,” he says. 

Simulation-based pipeline tailors training data for dexterous robots

When ChatGPT or Gemini give what seems to be an expert response to your burning questions, you may not realize how much information it relies on to give that reply. Like other popular generative artificial intelligence (AI) models, these chatbots rely on backbone systems called foundation models that train on billions, or even trillions, of data points.

In a similar vein, engineers are hoping to build foundation models that train a range of robots on new skills like picking up, moving, and putting down objects in places like homes and factories. The problem is that it’s difficult to collect and transfer instructional data across robotic systems. You could teach your system by teleoperating the hardware step-by-step using technology like virtual reality (VR), but that can be time-consuming. Training on videos from the internet is less instructive, since the clips don’t provide a step-by-step, specialized task walk-through for particular robots.

A simulation-driven approach called “PhysicsGen” from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Robotics and AI Institute customizes robot training data to help robots find the most efficient movements for a task. The system can multiply a few dozen VR demonstrations into nearly 3,000 simulations per machine. These high-quality instructions are then mapped to the precise configurations of mechanical companions like robotic arms and hands. 

PhysicsGen creates data that generalize to specific robots and condition via a three-step process. First, a VR headset tracks how humans manipulate objects like blocks using their hands. These interactions are mapped in a 3D physics simulator at the same time, visualizing the key points of our hands as small spheres that mirror our gestures. For example, if you flipped a toy over, you’d see 3D shapes representing different parts of your hands rotating a virtual version of that object.

The pipeline then remaps these points to a 3D model of the setup of a specific machine (like a robotic arm), moving them to the precise “joints” where a system twists and turns. Finally, PhysicsGen uses trajectory optimization — essentially simulating the most efficient motions to complete a task — so the robot knows the best ways to do things like repositioning a box.

Each simulation is a detailed training data point that walks a robot through potential ways to handle objects. When implemented into a policy (or the action plan that the robot follows), the machine has a variety of ways to approach a task, and can try out different motions if one doesn’t work.

“We’re creating robot-specific data without needing humans to re-record specialized demonstrations for each machine,” says Lujie Yang, an MIT PhD student in electrical engineering and computer science and CSAIL affiliate who is the lead author of a new paper introducing the project. “We’re scaling up the data in an autonomous and efficient way, making task instructions useful to a wider range of machines.”

Generating so many instructional trajectories for robots could eventually help engineers build a massive dataset to guide machines like robotic arms and dexterous hands. For example, the pipeline might help two robotic arms collaborate on picking up warehouse items and placing them in the right boxes for deliveries. The system may also guide two robots to work together in a household on tasks like putting away cups.

PhysicsGen’s potential also extends to converting data designed for older robots or different environments into useful instructions for new machines. “Despite being collected for a specific type of robot, we can revive these prior datasets to make them more generally useful,” adds Yang.

Addition by multiplication

PhysicsGen turned just 24 human demonstrations into thousands of simulated ones, helping both digital and real-world robots reorient objects.

Yang and her colleagues first tested their pipeline in a virtual experiment where a floating robotic hand needed to rotate a block into a target position. The digital robot executed the task at a rate of 81 percent accuracy by training on PhysicGen’s massive dataset, a 60 percent improvement from a baseline that only learned from human demonstrations.

The researchers also found that PhysicsGen could improve how virtual robotic arms collaborate to manipulate objects. Their system created extra training data that helped two pairs of robots successfully accomplish tasks as much as 30 percent more often than a purely human-taught baseline.

In an experiment with a pair of real-world robotic arms, the researchers observed similar improvements as the machines teamed up to flip a large box into its designated position. When the robots deviated from the intended trajectory or mishandled the object, they were able to recover mid-task by referencing alternative trajectories from their library of instructional data.

Senior author Russ Tedrake, who is the Toyota Professor of Electrical Engineering and Computer Science, Aeronautics and Astronautics, and Mechanical Engineering at MIT, adds that this imitation-guided data generation technique combines the strengths of human demonstration with the power of robot motion planning algorithms.

“Even a single demonstration from a human can make the motion planning problem much easier,” says Tedrake, who is also a senior vice president of large behavior models at the Toyota Research Institute and CSAIL principal investigator. “In the future, perhaps the foundation models will be able to provide this information, and this type of data generation technique will provide a type of post-training recipe for that model.”

The future of PhysicsGen

Soon, PhysicsGen may be extended to a new frontier: diversifying the tasks a machine can execute.

“We’d like to use PhysicsGen to teach a robot to pour water when it’s only been trained to put away dishes, for example,” says Yang. “Our pipeline doesn’t just generate dynamically feasible motions for familiar tasks; it also has the potential of creating a diverse library of physical interactions that we believe can serve as building blocks for accomplishing entirely new tasks a human hasn’t demonstrated.”

Creating lots of widely applicable training data may eventually help build a foundation model for robots, though MIT researchers caution that this is a somewhat distant goal. The CSAIL-led team is investigating how PhysicsGen can harness vast, unstructured resources — like internet videos — as seeds for simulation. The goal: transform everyday visual content into rich, robot-ready data that could teach machines to perform tasks no one explicitly showed them.

Yang and her colleagues also aim to make PhysicsGen even more useful for robots with diverse shapes and configurations in the future. To make that happen, they plan to leverage datasets with demonstrations of real robots, capturing how robotic joints move instead of human ones.

The researchers also plan to incorporate reinforcement learning, where an AI system learns by trial and error, to make PhysicsGen expand its dataset beyond human-provided examples. They may augment their pipeline with advanced perception techniques to help a robot perceive and interpret their environment visually, allowing the machine to analyze and adapt to the complexities of the physical world.

For now, PhysicsGen shows how AI can help us teach different robots to manipulate objects within the same category, particularly rigid ones. The pipeline may soon help robots find the best ways to handle soft items (like fruits) and deformable ones (like clay), but those interactions aren’t easy to simulate yet.

Yang and Tedrake wrote the paper with two CSAIL colleagues: co-lead author and MIT PhD student Hyung Ju “Terry” Suh SM ’22 and MIT PhD student Bernhard Paus Græsdal. Robotics and AI Institute researchers Tong Zhao ’22, MEng ’23, Tarik Kelestemur, Jiuguang Wang, and Tao Pang PhD ’23 are also authors. Their work was supported by the Robotics and AI Institute and Amazon.

The researchers recently presented their work at the Robotics: Science and Systems conference.

The unique, mathematical shortcuts language models use to predict dynamic scenarios

Let’s say you’re reading a story, or playing a game of chess. You may not have noticed, but each step of the way, your mind kept track of how the situation (or “state of the world”) was changing. You can imagine this as a sort of sequence of events list, which we use to update our prediction of what will happen next.

Language models like ChatGPT also track changes inside their own “mind” when finishing off a block of code or anticipating what you’ll write next. They typically make educated guesses using transformers — internal architectures that help the models understand sequential data — but the systems are sometimes incorrect because of flawed thinking patterns. Identifying and tweaking these underlying mechanisms helps language models become more reliable prognosticators, especially with more dynamic tasks like forecasting weather and financial markets.

But do these AI systems process developing situations like we do? A new paper from researchers in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Department of Electrical Engineering and Computer Science shows that the models instead use clever mathematical shortcuts between each progressive step in a sequence, eventually making reasonable predictions. The team made this observation by going under the hood of language models, evaluating how closely they could keep track of objects that change position rapidly. Their findings show that engineers can control when language models use particular workarounds as a way to improve the systems’ predictive capabilities.

Shell games

The researchers analyzed the inner workings of these models using a clever experiment reminiscent of a classic concentration game. Ever had to guess the final location of an object after it’s placed under a cup and shuffled with identical containers? The team used a similar test, where the model guessed the final arrangement of particular digits (also called a permutation). The models were given a starting sequence, such as “42135,” and instructions about when and where to move each digit, like moving the “4” to the third position and onward, without knowing the final result.

In these experiments, transformer-based models gradually learned to predict the correct final arrangements. Instead of shuffling the digits based on the instructions they were given, though, the systems aggregated information between successive states (or individual steps within the sequence) and calculated the final permutation.

One go-to pattern the team observed, called the “Associative Algorithm,” essentially organizes nearby steps into groups and then calculates a final guess. You can think of this process as being structured like a tree, where the initial numerical arrangement is the “root.” As you move up the tree, adjacent steps are grouped into different branches and multiplied together. At the top of the tree is the final combination of numbers, computed by multiplying each resulting sequence on the branches together.

The other way language models guessed the final permutation was through a crafty mechanism called the “Parity-Associative Algorithm,” which essentially whittles down options before grouping them. It determines whether the final arrangement is the result of an even or odd number of rearrangements of individual digits. Then, the mechanism groups adjacent sequences from different steps before multiplying them, just like the Associative Algorithm.

“These behaviors tell us that transformers perform simulation by associative scan. Instead of following state changes step-by-step, the models organize them into hierarchies,” says MIT PhD student and CSAIL affiliate Belinda Li SM ’23, a lead author on the paper. “How do we encourage transformers to learn better state tracking? Instead of imposing that these systems form inferences about data in a human-like, sequential way, perhaps we should cater to the approaches they naturally use when tracking state changes.”

“One avenue of research has been to expand test-time computing along the depth dimension, rather than the token dimension — by increasing the number of transformer layers rather than the number of chain-of-thought tokens during test-time reasoning,” adds Li. “Our work suggests that this approach would allow transformers to build deeper reasoning trees.”

Through the looking glass

Li and her co-authors observed how the Associative and Parity-Associative algorithms worked using tools that allowed them to peer inside the “mind” of language models. 

They first used a method called “probing,” which shows what information flows through an AI system. Imagine you could look into a model’s brain to see its thoughts at a specific moment — in a similar way, the technique maps out the system’s mid-experiment predictions about the final arrangement of digits.

A tool called “activation patching” was then used to show where the language model processes changes to a situation. It involves meddling with some of the system’s “ideas,” injecting incorrect information into certain parts of the network while keeping other parts constant, and seeing how the system will adjust its predictions.

These tools revealed when the algorithms would make errors and when the systems “figured out” how to correctly guess the final permutations. They observed that the Associative Algorithm learned faster than the Parity-Associative Algorithm, while also performing better on longer sequences. Li attributes the latter’s difficulties with more elaborate instructions to an over-reliance on heuristics (or rules that allow us to compute a reasonable solution fast) to predict permutations.

“We’ve found that when language models use a heuristic early on in training, they’ll start to build these tricks into their mechanisms,” says Li. “However, those models tend to generalize worse than ones that don’t rely on heuristics. We found that certain pre-training objectives can deter or encourage these patterns, so in the future, we may look to design techniques that discourage models from picking up bad habits.”

The researchers note that their experiments were done on small-scale language models fine-tuned on synthetic data, but found the model size had little effect on the results. This suggests that fine-tuning larger language models, like GPT 4.1, would likely yield similar results. The team plans to examine their hypotheses more closely by testing language models of different sizes that haven’t been fine-tuned, evaluating their performance on dynamic real-world tasks such as tracking code and following how stories evolve.

Harvard University postdoc Keyon Vafa, who was not involved in the paper, says that the researchers’ findings could create opportunities to advance language models. “Many uses of large language models rely on tracking state: anything from providing recipes to writing code to keeping track of details in a conversation,” he says. “This paper makes significant progress in understanding how language models perform these tasks. This progress provides us with interesting insights into what language models are doing and offers promising new strategies for improving them.”

Li wrote the paper with MIT undergraduate student Zifan “Carl” Guo and senior author Jacob Andreas, who is an MIT associate professor of electrical engineering and computer science and CSAIL principal investigator. Their research was supported, in part, by Open Philanthropy, the MIT Quest for Intelligence, the National Science Foundation, the Clare Boothe Luce Program for Women in STEM, and a Sloan Research Fellowship.

The researchers presented their research at the International Conference on Machine Learning (ICML) this week.

Five MIT faculty elected to the National Academy of Sciences for 2025

The National Academy of Sciences (NAS) has elected 120 members and 30 international members, including five MIT faculty members and 13 MIT alumni. Professors Rodney Brooks, Parag Pathak, Scott Sheffield, Benjamin Weiss, and Yukiko Yamashita were elected in recognition of their “distinguished and continuing achievements in original research.” Membership to the National Academy of Sciences is one of the highest honors a scientist can receive in their career.

Elected MIT alumni include: David Altshuler ’86, Rafael Camerini-Otero ’66, Kathleen Collins PhD ’92, George Daley PhD ’89, Scott Doney PhD ’91, John Doyle PhD ’91, Jonathan Ellman ’84, Shanhui Fan PhD ’97, Julia Greer ’97, Greg Lemke ’78, Stanley Perlman PhD ’72, David Reichman PhD ’97, and Risa Wechsler ’96. 

Those elected this year bring the total number of active members to 2,662, with 556 international members. The NAS is a private, nonprofit institution that was established under a congressional charter signed by President Abraham Lincoln in 1863. It recognizes achievement in science by election to membership, and — with the National Academy of Engineering and the National Academy of Medicine — provides science, engineering, and health policy advice to the federal government and other organizations.

Rodney Brooks

Rodney A. Brooks is the Panasonic Professor of Robotics Emeritus at MIT and the chief technical officer and co-founder of Robust AI. Previously, he was founder, chair, and CTO of Rethink Robotics and founder and CTO of iRobot Corp. He is also the former director of the MIT Artificial Intelligence Laboratory and the MIT Computer Science and Artificial Intelligence Laboratory. Brooks received degrees in pure mathematics from the Flinders University of South Australia and a PhD in computer science from Stanford University in 1981. He held research positions at Carnegie Mellon University and MIT, and a faculty position at Stanford before joining the faculty of MIT in 1984.

Brooks’ research is concerned with both the engineering of intelligent robots to operate in unstructured environments, and with understanding human intelligence through building humanoid robots. He has published papers and books in model-based computer vision, path planning, uncertainty analysis, robot assembly, active vision, autonomous robots, micro-robots, micro-actuators, planetary exploration, representation, artificial life, humanoid robots, and compiler design.

Brooks is a member of the National Academy of Engineering, a founding fellow of the Association for the Advancement of Artificial Intelligence, a fellow of the American Academy of Arts and Sciences, the American Association for the Advancement of Science, the Association for Computing Machinery, a foreign fellow of The Australian Academy of Technological Sciences and Engineering, and a corresponding member of the Australian Academy of Science. He won the Computers and Thought Award at the 1991 International Joint Conference on Artificial Intelligence, and the IEEE Founders Medal in 2023.

Parag Pathak

Parag Pathak is the Class of 1922 Professor of Economics and a founder and director of MIT’s Blueprint Labs. He joined the MIT faculty in 2008 after completing his PhD in business economics and his master’s and bachelor’s degrees in applied mathematics, all at Harvard University.

Pathak is best known for his work on market design and education. His research has informed student placement and school choice mechanisms across the United States, including in Boston, New York City, Chicago, and Washington, and his recent work applies ideas from market design to the rationing of vital medical resources. Pathak has also authored leading studies on school quality, charter schools, and affirmative action. In urban economics, he has measured the effects of foreclosures on house prices and how the housing market reacted to the end of rent control in Cambridge, Massachusetts.

Pathak’s research on market design was recognized with the 2018 John Bates Clark Medal, given by the American Economic Association to the economist under 40 whose work is judged to have made the most significant contribution to the field. He is a fellow of the American Academy of Arts and Sciences, the Econometric Society, and the Society for the Advancement of Economic Theory. Pathak is also the founding co-director of the market design working group at the National Bureau of Economic Research, and a co-founder of Avela Education.

Scott Sheffield

Scott Sheffield, Leighton Family Professor of Mathematics, joined the MIT faculty in 2008 after a faculty appointment at the Courant Institute at New York University. He received a PhD in mathematics from Stanford University in 2003 under the supervision of Amir Dembo, and completed BA and MA degrees in mathematics from Harvard University in 1998.

Sheffield is a probability theorist, working on geometrical questions that arise in such areas as statistical physics, game theory, and metric spaces, as well as long-standing problems in percolation theory and the theory of random surfaces.

In 2017, Sheffield received the Clay Research Award with Jason Miller, “in recognition of their groundbreaking and conceptually novel work on the geometry of Gaussian free field and its application to the solution of open problems in the theory of two-dimensional random structures.” In 2023, he received the Leonard Eisenbud Prize with Jason Miller “for works on random two-dimensional geometries, and in particular on Liouville Quantum Gravity.” Later in 2023, Sheffield received the Frontiers of Science Award with Jason Miller for the paper “Liouville quantum gravity and the Brownian map I: the QLE(8/3,0) metric.” Sheffield is a fellow of the American Academy of Arts and Science.

Benjamin Weiss

Benjamin Weiss is the Robert R. Schrock Professor of Earth and Planetary Sciences. He studied physics at Amherst College as an undergraduate and went on to study planetary science and geology at Caltech, where he earned a master’s degree in 2001 and PhD in 2003. Weiss’ doctoral dissertation on Martian meteorite ALH 84001 revealed records of the ancient Martian climate and magnetic field, and provided evidence some meteorites could transfer materials from Mars to Earth without heat-sterilization. Weiss became a member of the Department of Earth, Atmospheric and Planetary Sciences faculty in 2004 and is currently chair of the Program in Planetary Science.

A specialist in magnetometry, Weiss seeks to understand the formation and evolution of the Earth, terrestrial planets, and small solar system bodies through laboratory analysis, spacecraft observations, and fieldwork. He is known for key insights into the history of our solar system, including discoveries about the early nebular magnetic field, the moon’s long-lived core dynamo, and asteroids that generated core dynamos in the past. In addition to leadership roles on current, active NASA missions — as deputy principal investigator for Psyche, and co-investigator for Mars Perseverance and Europa Clipper — Weiss has also been part of science teams for the SpaceIL Beresheet, JAXA Hayabusa 2, and ESA Rosetta spacecraft.

As principal investigator of the MIT Planetary Magnetism Laboratory, Weiss works to develop high-sensitivity, high-resolution techniques in magnetic microscopy to image the magnetic fields embedded in rock samples collected from meteorites, the lunar surface, and sites around the Earth. Studying these magnetic signatures can help answer questions about the conditions of the early solar system, past climates on Earth and Mars, and factors that promote habitability.

Yukiko Yamashita

Yukiko Yamashita is a professor of biology at MIT, a core member of the Whitehead Institute for Biomedical Research, and an investigator at the Howard Hughes Medical Institute (HHMI). Yamashita earned her BS in biology in 1994 and her PhD in biophysics in 1999 from Kyoto University. From 2001 to 2006, she did postdoctoral research at Stanford University. She was appointed to the University of Michigan faculty in 2007 and was named an HHMI Investigator in 2014. She became a member of the Whitehead Institute and a professor of biology at MIT in 2020.

Yukiko Yamashita studies two fundamental aspects of multicellular organisms: how cell fates are diversified via asymmetric cell division, and how genetic information is transmitted through generations via the germline.

Two remarkable feats of multicellular organisms are generation of many distinct cell types via asymmetric cell division and transmission of the germline genome to the next generation, essentially in eternity. Studying these processes using the Drosophila male germline as a model system has led us to venture into new areas of study, such as functions of satellite DNA, “genomic junk,” and how they might be involved in speciation.

Yamashita is a member of the American Academy of Arts and Sciences, a fellow of the American Society for Cell Biology, and the winner of the Tsuneko and Reiji Okazaki Award in 2016. She was named a MacArthur Fellow in 2011.

How to more efficiently study complex treatment interactions

MIT researchers have developed a new theoretical framework for studying the mechanisms of treatment interactions. Their approach allows scientists to efficiently estimate how combinations of treatments will affect a group of units, such as cells, enabling a researcher to perform fewer costly experiments while gathering more accurate data.

As an example, to study how interconnected genes affect cancer cell growth, a biologist might need to use a combination of treatments to target multiple genes at once. But because there could be billions of potential combinations for each round of the experiment, choosing a subset of combinations to test might bias the data their experiment generates. 

In contrast, the new framework considers the scenario where the user can efficiently design an unbiased experiment by assigning all treatments in parallel, and can control the outcome by adjusting the rate of each treatment.

The MIT researchers theoretically proved a near-optimal strategy in this framework and performed a series of simulations to test it in a multiround experiment. Their method minimized the error rate in each instance.

This technique could someday help scientists better understand disease mechanisms and develop new medicines to treat cancer or genetic disorders.

“We’ve introduced a concept people can think more about as they study the optimal way to select combinatorial treatments at each round of an experiment. Our hope is this can someday be used to solve biologically relevant questions,” says graduate student Jiaqi Zhang, an Eric and Wendy Schmidt Center Fellow and co-lead author of a paper on this experimental design framework.

She is joined on the paper by co-lead author Divya Shyamal, an MIT undergraduate; and senior author Caroline Uhler, the Andrew and Erna Viterbi Professor of Engineering in EECS and the MIT Institute for Data, Systems, and Society (IDSS), who is also director of the Eric and Wendy Schmidt Center and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS). The research was recently presented at the International Conference on Machine Learning.

Simultaneous treatments

Treatments can interact with each other in complex ways. For instance, a scientist trying to determine whether a certain gene contributes to a particular disease symptom may have to target several genes simultaneously to study the effects.

To do this, scientists use what are known as combinatorial perturbations, where they apply multiple treatments at once to the same group of cells.

“Combinatorial perturbations will give you a high-level network of how different genes interact, which provides an understanding of how a cell functions,” Zhang explains.

Since genetic experiments are costly and time-consuming, the scientist aims to select the best subset of treatment combinations to test, which is a steep challenge due to the huge number of possibilities.

Picking a suboptimal subset can generate biased results by focusing only on combinations the user selected in advance.

The MIT researchers approached this problem differently by looking at a probabilistic framework. Instead of focusing on a selected subset, each unit randomly takes up combinations of treatments based on user-specified dosage levels for each treatment.

The user sets dosage levels based on the goal of their experiment — perhaps this scientist wants to study the effects of four different drugs on cell growth. The probabilistic approach generates less biased data because it does not restrict the experiment to a predetermined subset of treatments.

The dosage levels are like probabilities, and each cell receives a random combination of treatments. If the user sets a high dosage, it is more likely most of the cells will take up that treatment. A smaller subset of cells will take up that treatment if the dosage is low.

“From there, the question is how do we design the dosages so that we can estimate the outcomes as accurately as possible? This is where our theory comes in,” Shyamal adds.

Their theoretical framework shows the best way to design these dosages so one can learn the most about the characteristic or trait they are studying.

After each round of the experiment, the user collects the results and feeds those back into the experimental framework. It will output the ideal dosage strategy for the next round, and so on, actively adapting the strategy over multiple rounds.

Optimizing dosages, minimizing error

The researchers proved their theoretical approach generates optimal dosages, even when the dosage levels are affected by a limited supply of treatments or when noise in the experimental outcomes varies at each round.

In simulations, this new approach had the lowest error rate when comparing estimated and actual outcomes of multiround experiments, outperforming two baseline methods.

In the future, the researchers want to enhance their experimental framework to consider interference between units and the fact that certain treatments can lead to selection bias. They would also like to apply this technique in a real experimental setting.

“This is a new approach to a very interesting problem that is hard to solve. Now, with this new framework in hand, we can think more about the best way to design experiments for many different applications,” Zhang says.

This research is funded, in part, by the Advanced Undergraduate Research Opportunities Program at MIT, Apple, the National Institutes of Health, the Office of Naval Research, the Department of Energy, the Eric and Wendy Schmidt Center at the Broad Institute, and a Simons Investigator Award.

Can AI really code? Study maps the roadblocks to autonomous software engineering

Imagine a future where artificial intelligence quietly shoulders the drudgery of software development: refactoring tangled code, migrating legacy systems, and hunting down race conditions, so that human engineers can devote themselves to architecture, design, and the genuinely novel problems still beyond a machine’s reach. Recent advances appear to have nudged that future tantalizingly close, but a new paper by researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and several collaborating institutions argues that this potential future reality demands a hard look at present-day challenges. 

Titled “Challenges and Paths Towards AI for Software Engineering,” the work maps the many software-engineering tasks beyond code generation, identifies current bottlenecks, and highlights research directions to overcome them, aiming to let humans focus on high-level design while routine work is automated. 

“Everyone is talking about how we don’t need programmers anymore, and there’s all this automation now available,” says Armando Solar‑Lezama, MIT professor of electrical engineering and computer science, CSAIL principal investigator, and senior author of the study. “On the one hand, the field has made tremendous progress. We have tools that are way more powerful than any we’ve seen before. But there’s also a long way to go toward really getting the full promise of automation that we would expect.”

Solar-Lezama argues that popular narratives often shrink software engineering to “the undergrad programming part: someone hands you a spec for a little function and you implement it, or solving LeetCode-style programming interviews.” Real practice is far broader. It includes everyday refactors that polish design, plus sweeping migrations that move millions of lines from COBOL to Java and reshape entire businesses. It requires nonstop testing and analysis — fuzzing, property-based testing, and other methods — to catch concurrency bugs, or patch zero-day flaws. And it involves the maintenance grind: documenting decade-old code, summarizing change histories for new teammates, and reviewing pull requests for style, performance, and security.

Industry-scale code optimization — think re-tuning GPU kernels or the relentless, multi-layered refinements behind Chrome’s V8 engine — remains stubbornly hard to evaluate. Today’s headline metrics were designed for short, self-contained problems, and while multiple-choice tests still dominate natural-language research, they were never the norm in AI-for-code. The field’s de facto yardstick, SWE-Bench, simply asks a model to patch a GitHub issue: useful, but still akin to the “undergrad programming exercise” paradigm. It touches only a few hundred lines of code, risks data leakage from public repositories, and ignores other real-world contexts — AI-assisted refactors, human–AI pair programming, or performance-critical rewrites that span millions of lines. Until benchmarks expand to capture those higher-stakes scenarios, measuring progress — and thus accelerating it — will remain an open challenge.

If measurement is one obstacle, human‑machine communication is another. First author Alex  Gu, an MIT graduate student in electrical engineering and computer science, sees today’s interaction as “a thin line of communication.” When he asks a system to generate code, he often receives a large, unstructured file and even a set of unit tests, yet those tests tend to be superficial. This gap extends to the AI’s ability to effectively use the wider suite of software engineering tools, from debuggers to static analyzers, that humans rely on for precise control and deeper understanding. “I don’t really have much control over what the model writes,” he says. “Without a channel for the AI to expose its own confidence — ‘this part’s correct … this part, maybe double‑check’ — developers risk blindly trusting hallucinated logic that compiles, but collapses in production. Another critical aspect is having the AI know when to defer to the user for clarification.” 

Scale compounds these difficulties. Current AI models struggle profoundly with large code bases, often spanning millions of lines. Foundation models learn from public GitHub, but “every company’s code base is kind of different and unique,” Gu says, making proprietary coding conventions and specification requirements fundamentally out of distribution. The result is code that looks plausible yet calls non‑existent functions, violates internal style rules, or fails continuous‑integration pipelines. This often leads to AI-generated code that “hallucinates,” meaning it creates content that looks plausible but doesn’t align with the specific internal conventions, helper functions, or architectural patterns of a given company. 

Models will also often retrieve incorrectly, because it retrieves code with a similar name (syntax) rather than functionality and logic, which is what a model might need to know how to write the function. “Standard retrieval techniques are very easily fooled by pieces of code that are doing the same thing but look different,” says Solar‑Lezama. 

The authors mention that since there is no silver bullet to these issues, they’re calling instead for community‑scale efforts: richer, having data that captures the process of developers writing code (for example, which code developers keep versus throw away, how code gets refactored over time, etc.), shared evaluation suites that measure progress on refactor quality, bug‑fix longevity, and migration correctness; and transparent tooling that lets models expose uncertainty and invite human steering rather than passive acceptance. Gu frames the agenda as a “call to action” for larger open‑source collaborations that no single lab could muster alone. Solar‑Lezama imagines incremental advances—“research results taking bites out of each one of these challenges separately”—that feed back into commercial tools and gradually move AI from autocomplete sidekick toward genuine engineering partner.

“Why does any of this matter? Software already underpins finance, transportation, health care, and the minutiae of daily life, and the human effort required to build and maintain it safely is becoming a bottleneck. An AI that can shoulder the grunt work — and do so without introducing hidden failures — would free developers to focus on creativity, strategy, and ethics” says Gu. “But that future depends on acknowledging that code completion is the easy part; the hard part is everything else. Our goal isn’t to replace programmers. It’s to amplify them. When AI can tackle the tedious and the terrifying, human engineers can finally spend their time on what only humans can do.”

“With so many new works emerging in AI for coding, and the community often chasing the latest trends, it can be hard to step back and reflect on which problems are most important to tackle,” says Baptiste Rozière, an AI scientist at Mistral AI, who wasn’t involved in the paper. “I enjoyed reading this paper because it offers a clear overview of the key tasks and challenges in AI for software engineering. It also outlines promising directions for future research in the field.”

Gu and Solar-Lezama wrote the paper with University of California at Berkeley Professor Koushik Sen and PhD students Naman Jain and Manish Shetty, Cornell University Assistant Professor Kevin Ellis and PhD student Wen-Ding Li, Stanford University Assistant Professor Diyi Yang and PhD student Yijia Shao, and incoming Johns Hopkins University assistant professor Ziyang Li. Their work was supported, in part, by the National Science Foundation (NSF), SKY Lab industrial sponsors and affiliates, Intel Corp. through an NSF grant, and the Office of Naval Research.

The researchers are presenting their work at the International Conference on Machine Learning (ICML). 

AI shapes autonomous underwater “gliders”

Marine scientists have long marveled at how animals like fish and seals swim so efficiently despite having different shapes. Their bodies are optimized for efficient, hydrodynamic aquatic navigation so they can exert minimal energy when traveling long distances.

Autonomous vehicles can drift through the ocean in a similar way, collecting data about vast underwater environments. However, the shapes of these gliding machines are less diverse than what we find in marine life — go-to designs often resemble tubes or torpedoes, since they’re fairly hydrodynamic as well. Plus, testing new builds requires lots of real-world trial-and-error.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the University of Wisconsin at Madison propose that AI could help us explore uncharted glider designs more conveniently. Their method uses machine learning to test different 3D designs in a physics simulator, then molds them into more hydrodynamic shapes. The resulting model can be fabricated via a 3D printer using significantly less energy than hand-made ones.

The MIT scientists say that this design pipeline could create new, more efficient machines that help oceanographers measure water temperature and salt levels, gather more detailed insights about currents, and monitor the impacts of climate change. The team demonstrated this potential by producing two gliders roughly the size of a boogie board: a two-winged machine resembling an airplane, and a unique, four-winged object resembling a flat fish with four fins.

Peter Yichen Chen, MIT CSAIL postdoc and co-lead researcher on the project, notes that these designs are just a few of the novel shapes his team’s approach can generate. “We’ve developed a semi-automated process that can help us test unconventional designs that would be very taxing for humans to design,” he says. “This level of shape diversity hasn’t been explored previously, so most of these designs haven’t been tested in the real world.”

But how did AI come up with these ideas in the first place? First, the researchers found 3D models of over 20 conventional sea exploration shapes, such as submarines, whales, manta rays, and sharks. Then, they enclosed these models in “deformation cages” that map out different articulation points that the researchers pulled around to create new shapes.

The CSAIL-led team built a dataset of conventional and deformed shapes before simulating how they would perform at different “angles-of-attack” — the direction a vessel will tilt as it glides through the water. For example, a swimmer may want to dive at a -30 degree angle to retrieve an item from a pool.

These diverse shapes and angles of attack were then used as inputs for a neural network that essentially anticipates how efficiently a glider shape will perform at particular angles and optimizes it as needed.

Giving gliding robots a lift

The team’s neural network simulates how a particular glider would react to underwater physics, aiming to capture how it moves forward and the force that drags against it. The goal: find the best lift-to-drag ratio, representing how much the glider is being held up compared to how much it’s being held back. The higher the ratio, the more efficiently the vehicle travels; the lower it is, the more the glider will slow down during its voyage.

Lift-to-drag ratios are key for flying planes: At takeoff, you want to maximize lift to ensure it can glide well against wind currents, and when landing, you need sufficient force to drag it to a full stop.

Niklas Hagemann, an MIT graduate student in architecture and CSAIL affiliate, notes that this ratio is just as useful if you want a similar gliding motion in the ocean.

“Our pipeline modifies glider shapes to find the best lift-to-drag ratio, optimizing its performance underwater,” says Hagemann, who is also a co-lead author on a paper that was presented at the International Conference on Robotics and Automation in June. “You can then export the top-performing designs so they can be 3D-printed.”

Going for a quick glide

While their AI pipeline seemed realistic, the researchers needed to ensure its predictions about glider performance were accurate by experimenting in more lifelike environments.

They first fabricated their two-wing design as a scaled-down vehicle resembling a paper airplane. This glider was taken to MIT’s Wright Brothers Wind Tunnel, an indoor space with fans that simulate wind flow. Placed at different angles, the glider’s predicted lift-to-drag ratio was only about 5 percent higher on average than the ones recorded in the wind experiments — a small difference between simulation and reality.

A digital evaluation involving a visual, more complex physics simulator also supported the notion that the AI pipeline made fairly accurate predictions about how the gliders would move. It visualized how these machines would descend in 3D.

To truly evaluate these gliders in the real world, though, the team needed to see how their devices would fare underwater. They printed two designs that performed the best at specific points-of-attack for this test: a jet-like device at 9 degrees and the four-wing vehicle at 30 degrees.

Both shapes were fabricated in a 3D printer as hollow shells with small holes that flood when fully submerged. This lightweight design makes the vehicle easier to handle outside of the water and requires less material to be fabricated. The researchers placed a tube-like device inside these shell coverings, which housed a range of hardware, including a pump to change the glider’s buoyancy, a mass shifter (a device that controls the machine’s angle-of-attack), and electronic components.

Each design outperformed a handmade torpedo-shaped glider by moving more efficiently across a pool. With higher lift-to-drag ratios than their counterpart, both AI-driven machines exerted less energy, similar to the effortless ways marine animals navigate the oceans.

As much as the project is an encouraging step forward for glider design, the researchers are looking to narrow the gap between simulation and real-world performance. They are also hoping to develop machines that can react to sudden changes in currents, making the gliders more adaptable to seas and oceans.

Chen adds that the team is looking to explore new types of shapes, particularly thinner glider designs. They intend to make their framework faster, perhaps bolstering it with new features that enable more customization, maneuverability, or even the creation of miniature vehicles.

Chen and Hagemann co-led research on this project with OpenAI researcher Pingchuan Ma SM ’23, PhD ’25. They authored the paper with Wei Wang, a University of Wisconsin at Madison assistant professor and recent CSAIL postdoc; John Romanishin ’12, SM ’18, PhD ’23; and two MIT professors and CSAIL members: lab director Daniela Rus and senior author Wojciech Matusik. Their work was supported, in part, by a Defense Advanced Research Projects Agency (DARPA) grant and the MIT-GIST Program.

Confronting the AI/energy conundrum

The explosive growth of AI-powered computing centers is creating an unprecedented surge in electricity demand that threatens to overwhelm power grids and derail climate goals. At the same time, artificial intelligence technologies could revolutionize energy systems, accelerating the transition to clean power.

“We’re at a cusp of potentially gigantic change throughout the economy,” said William H. Green, director of the MIT Energy Initiative (MITEI) and Hoyt C. Hottel Professor in the MIT Department of Chemical Engineering, at MITEI’s Spring Symposium, “AI and energy: Peril and promise,” held on May 13. The event brought together experts from industry, academia, and government to explore solutions to what Green described as both “local problems with electric supply and meeting our clean energy targets” while seeking to “reap the benefits of AI without some of the harms.” The challenge of data center energy demand and potential benefits of AI to the energy transition is a research priority for MITEI.

AI’s startling energy demands

From the start, the symposium highlighted sobering statistics about AI’s appetite for electricity. After decades of flat electricity demand in the United States, computing centers now consume approximately 4 percent of the nation’s electricity. Although there is great uncertainty, some projections suggest this demand could rise to 12-15 percent by 2030, largely driven by artificial intelligence applications.

Vijay Gadepally, senior scientist at MIT’s Lincoln Laboratory, emphasized the scale of AI’s consumption. “The power required for sustaining some of these large models is doubling almost every three months,” he noted. “A single ChatGPT conversation uses as much electricity as charging your phone, and generating an image consumes about a bottle of water for cooling.”

Facilities requiring 50 to 100 megawatts of power are emerging rapidly across the United States and globally, driven both by casual and institutional research needs relying on large language programs such as ChatGPT and Gemini. Gadepally cited congressional testimony by Sam Altman, CEO of OpenAI, highlighting how fundamental this relationship has become: “The cost of intelligence, the cost of AI, will converge to the cost of energy.”

“The energy demands of AI are a significant challenge, but we also have an opportunity to harness these vast computational capabilities to contribute to climate change solutions,” said Evelyn Wang, MIT vice president for energy and climate and the former director at the Advanced Research Projects Agency-Energy (ARPA-E) at the U.S. Department of Energy.

Wang also noted that innovations developed for AI and data centers — such as efficiency, cooling technologies, and clean-power solutions — could have broad applications beyond computing facilities themselves.

Kathryn Biegel (center), the manager of R&D and corporate strategy at Constellation Energy, discussed strategies to meet clean energy goals from data centers with panelists Uday Varadrarjan of RMI (left) and Emre Gençer of Sesame Sustainability, and moderator Ruaridh Macdonald of MITEI (right, facing away).
Photo credit: Jake Belcher

Strategies for clean energy solutions

The symposium explored multiple pathways to address the AI-energy challenge. Some panelists presented models suggesting that while artificial intelligence may increase emissions in the short term, its optimization capabilities could enable substantial emissions reductions after 2030 through more efficient power systems and accelerated clean technology development.

Research shows regional variations in the cost of powering computing centers with clean electricity, according to Emre Gençer, co-founder and CEO of Sesame Sustainability and former MITEI principal research scientist. Gençer’s analysis revealed that the central United States offers considerably lower costs due to complementary solar and wind resources. However, achieving zero-emission power would require massive battery deployments — five to 10 times more than moderate carbon scenarios — driving costs two to three times higher.

“If we want to do zero emissions with reliable power, we need technologies other than renewables and batteries, which will be too expensive,” Gençer said. He pointed to “long-duration storage technologies, small modular reactors, geothermal, or hybrid approaches” as necessary complements.

Because of data center energy demand, there is renewed interest in nuclear power, noted Kathryn Biegel, manager of R&D and corporate strategy at Constellation Energy, adding that her company is restarting the reactor at the former Three Mile Island site, now called the “Crane Clean Energy Center,” to meet this demand. “The data center space has become a major, major priority for Constellation,” she said, emphasizing how their needs for both reliability and carbon-free electricity are reshaping the power industry.

Can AI accelerate the energy transition?

Artificial intelligence could dramatically improve power systems, according to Priya Donti, assistant professor and the Silverman Family Career Development Professor in MIT’s Department of Electrical Engineering and Computer Science and the Laboratory for Information and Decision Systems. She showcased how AI can accelerate power grid optimization by embedding physics-based constraints into neural networks, potentially solving complex power flow problems at “10 times, or even greater, speed compared to your traditional models.”

AI is already reducing carbon emissions, according to examples shared by Antonia Gawel, global director of sustainability and partnerships at Google. Google Maps’ fuel-efficient routing feature has “helped to prevent more than 2.9 million metric tons of GHG [greenhouse gas] emissions reductions since launch, which is the equivalent of taking 650,000 fuel-based cars off the road for a year,” she said. Another Google research project uses artificial intelligence to help pilots avoid creating contrails, which represent about 1 percent of global warming impact.

AI’s potential to speed materials discovery for power applications was highlighted by Rafael Gómez-Bombarelli, the Paul M. Cook Career Development Associate Professor in the MIT Department of Materials Science and Engineering. “AI-supervised models can be trained to go from structure to property,” he noted, enabling the development of materials crucial for both computing and efficiency.

Securing growth with sustainability

Throughout the symposium, participants grappled with balancing rapid AI deployment against environmental impacts. While AI training receives most attention, Dustin Demetriou, senior technical staff member in sustainability and data center innovation at IBM, quoted a World Economic Forum article that suggested that “80 percent of the environmental footprint is estimated to be due to inferencing.” Demetriou emphasized the need for efficiency across all artificial intelligence applications.

Jevons’ paradox, where “efficiency gains tend to increase overall resource consumption rather than decrease it” is another factor to consider, cautioned Emma Strubell, the Raj Reddy Assistant Professor in the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. Strubell advocated for viewing computing center electricity as a limited resource requiring thoughtful allocation across different applications.

Several presenters discussed novel approaches for integrating renewable sources with existing grid infrastructure, including potential hybrid solutions that combine clean installations with existing natural gas plants that have valuable grid connections already in place. These approaches could provide substantial clean capacity across the United States at reasonable costs while minimizing reliability impacts.

Navigating the AI-energy paradox

The symposium highlighted MIT’s central role in developing solutions to the AI-electricity challenge.

Green spoke of a new MITEI program on computing centers, power, and computation that will operate alongside the comprehensive spread of MIT Climate Project research. “We’re going to try to tackle a very complicated problem all the way from the power sources through the actual algorithms that deliver value to the customers — in a way that’s going to be acceptable to all the stakeholders and really meet all the needs,” Green said.

Participants in the symposium were polled about priorities for MIT’s research by Randall Field, MITEI director of research. The real-time results ranked “data center and grid integration issues” as the top priority, followed by “AI for accelerated discovery of advanced materials for energy.”

In addition, attendees revealed that most view AI’s potential regarding power as a “promise,” rather than a “peril,” although a considerable portion remain uncertain about the ultimate impact. When asked about priorities in power supply for computing facilities, half of the respondents selected carbon intensity as their top concern, with reliability and cost following.

Study could lead to LLMs that are better at complex reasoning

For all their impressive capabilities, large language models (LLMs) often fall short when given challenging new tasks that require complex reasoning skills.

While an accounting firm’s LLM might excel at summarizing financial reports, that same model could fail unexpectedly if tasked with predicting market trends or identifying fraudulent transactions.

To make LLMs more adaptable, MIT researchers investigated how a certain training technique can be strategically deployed to boost a model’s performance on unfamiliar, difficult problems.

They show that test-time training, a method that involves temporarily updating some of a model’s inner workings during deployment, can lead to a sixfold improvement in accuracy. The researchers developed a framework for implementing a test-time training strategy that uses examples of the new task to maximize these gains.

Their work could improve a model’s flexibility, enabling an off-the-shelf LLM to adapt to complex tasks that require planning or abstraction. This could lead to LLMs that would be more accurate in many applications that require logical deduction, from medical diagnostics to supply chain management.

“Genuine learning — what we did here with test-time training — is something these models can’t do on their own after they are shipped. They can’t gain new skills or get better at a task. But we have shown that if you push the model a little bit to do actual learning, you see that huge improvements in performance can happen,” says Ekin Akyürek PhD ’25, lead author of the study.

Akyürek is joined on the paper by graduate students Mehul Damani, Linlu Qiu, Han Guo, and Jyothish Pari; undergraduate Adam Zweiger; and senior authors Yoon Kim, an assistant professor of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Jacob Andreas, an associate professor in EECS and a member of CSAIL. The research will be presented at the International Conference on Machine Learning.

Tackling hard domains

LLM users often try to improve the performance of their model on a new task using a technique called in-context learning. They feed the model a few examples of the new task as text prompts which guide the model’s outputs.

But in-context learning doesn’t always work for problems that require logic and reasoning.

The MIT researchers investigated how test-time training can be used in conjunction with in-context learning to boost performance on these challenging tasks. Test-time training involves updating some model parameters — the internal variables it uses to make predictions — using a small amount of new data specific to the task at hand.

The researchers explored how test-time training interacts with in-context learning. They studied design choices that maximize the performance improvements one can coax out of a general-purpose LLM.

“We find that test-time training is a much stronger form of learning. While simply providing examples can modestly boost accuracy, actually updating the model with those examples can lead to significantly better performance, particularly in challenging domains,” Damani says.

In-context learning requires a small set of task examples, including problems and their solutions. The researchers use these examples to create a task-specific dataset needed for test-time training.

To expand the size of this dataset, they create new inputs by slightly changing the problems and solutions in the examples, such as by horizontally flipping some input data. They find that training the model on the outputs of this new dataset leads to the best performance.

In addition, the researchers only update a small number of model parameters using a technique called low-rank adaption, which improves the efficiency of the test-time training process.

“This is important because our method needs to be efficient if it is going to be deployed in the real world. We find that you can get huge improvements in accuracy with a very small amount of parameter training,” Akyürek says.

Developing new skills

Streamlining the process is key, since test-time training is employed on a per-instance basis, meaning a user would need to do this for each individual task. The updates to the model are only temporary, and the model reverts to its original form after making a prediction.

A model that usually takes less than a minute to answer a query might take five or 10 minutes to provide an answer with test-time training, Akyürek adds.

“We wouldn’t want to do this for all user queries, but it is useful if you have a very hard task that you want to the model to solve well. There also might be tasks that are too challenging for an LLM to solve without this method,” he says.

The researchers tested their approach on two benchmark datasets of extremely complex problems, such as IQ puzzles. It boosted accuracy as much as sixfold over techniques that use only in-context learning.

Tasks that involved structured patterns or those which used completely unfamiliar types of data showed the largest performance improvements.

“For simpler tasks, in-context learning might be OK. But updating the parameters themselves might develop a new skill in the model,” Damani says.

In the future, the researchers want to use these insights toward the development of models that continually learn.

The long-term goal is an LLM that, given a query, can automatically determine if it needs to use test-time training to update parameters or if it can solve the task using in-context learning, and then implement the best test-time training strategy without the need for human intervention.

This work is supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation.

The high-tech wizardry of integrated photonics

Inspired by the “Harry Potter” stories and the Disney Channel show “Wizards of Waverly Place,” 7-year-old Sabrina Corsetti emphatically declared to her parents one afternoon that she was, in fact, a wizard.

“My dad turned to me and said that, if I really wanted to be a wizard, then I should become a physicist. Physicists are the real wizards of the world,” she recalls.

That conversation stuck with Corsetti throughout her childhood, all the way up to her decision to double-major in physics and math in college, which set her on a path to MIT, where she is now a graduate student in the Department of Electrical Engineering and Computer Science.

While her work may not involve incantations or magic wands, Corsetti’s research centers on an area that often produces astonishing results: integrated photonics. A relatively young field, integrated photonics involves building computer chips that route light instead of electricity, enabling compact and scalable solutions for applications ranging from communications to sensing.

Corsetti and her collaborators in the Photonics and Electronics Research Group, led by Professor Jelena Notaros, develop chip-sized devices which enable innovative applications that push the boundaries of what is possible in optics.

For instance, Corsetti and the team developed a chip-based 3D printer, small enough to sit in the palm of one’s hand, that emits a reconfigurable beam of light into resin to create solid shapes. Such a device could someday enable a user to rapidly fabricate customized, low-cost objects on the go.

She also contributed to creating a miniature “tractor beam” that uses a beam of light to capture and manipulate biological particles using a chip. This could help biologists study DNA or investigate the mechanisms of disease without contaminating tissue samples.

More recently, Corsetti has been working on a project in collaboration with MIT Lincoln Laboratory, focused on trapped-ion quantum computing, which involves the manipulation of ions to store and process quantum information.

“Our team has a strong focus on designing devices and systems that interact with the environment. The opportunity to join a new research group, led by a supportive and engaged advisor, that works on projects with a lot of real-world impacts, is primarily what drew me to MIT,” Corsetti says.

Corsetti was a science- and math-focused kid growing up with her parents and younger brother in the suburbs of Chicago, where her family operates a structural steelwork company. Photo credit: Jodi Hilton

Embracing challenges

Years before she set foot in a research lab, Corsetti was a science- and math-focused kid growing up with her parents and younger brother in the suburbs of Chicago, where her family operates a structural steelwork company.

Throughout her childhood, her teachers fostered her love of learning, from her early years in the Frankfort 157-C school district through her time at the Lincoln-Way East High School.

She enjoyed working on science experiments outside the classroom and relished the chance to tackle complex conundrums during independent study projects curated by her teachers (like calculating the math behind the Brachistochrone Curve, or the shortest path between two points, which was famously solved by Isaac Newton).

Corsetti decided to double-major in physics and math at the University of Michigan after graduating from high school a year early.

“When I went to the University of Michigan, I couldn’t wait to get started. I enrolled in the toughest math and physics track right off the bat,” she recalls.

But Corsetti soon found that she had bitten off a bit more than she could chew. A lot of her tough undergraduate courses assumed students had prior knowledge from AP physics and math classes, which Corsetti hadn’t taken because she graduated early.

She met with professors, attended office hours, and tried to pick up the lessons she had missed, but felt so discouraged she contemplated switching majors. Before she made the switch, Corsetti decided to try working in a physics lab to see if she liked a day in the life of a researcher.

After joining Professor Wolfgang Lorenzon’s lab at Michigan, Corsetti spent hours working with grad students and postdocs on a hands-on project to build cells that would hold liquid hydrogen for a particle physics experiment.

As they collaborated for hours at a time to roll material into tubes, she peppered the older students with questions about their experiences in the field.

“Being in the lab made me fall in love with physics. I really enjoyed that environment, working with my hands, and working with people as part of a bigger team,” she says.

Her affinity for hands-on lab work was amplified a few years later when she met Professor Tom Schwarz, her research advisor for the rest of her time at Michigan.

Following a chance conversation with Schwarz, she applied to a research abroad program at CERN in Switzerland, where she was mentored by Siyuan Sun. There, she had the opportunity to join thousands of physicists and engineers on the ATLAS project, writing code and optimizing circuits for new particle-detector technologies.

“That was one of the most transformative experiences of my life. After I came back to Michigan, I was ready to spend my career focusing on research,” she says.

Hooked on photonics

Corsetti began applying to graduate schools but decided to shift focus from the more theoretical particle physics to electrical engineering, with an interest in conducting hands-on chip-design and testing research.

She applied to MIT with a focus on standard electronic-chip design, so it came as a surprise when Notaros reached out to her to schedule a Zoom call. At the time, Corsetti was completely unfamiliar with integrated photonics. However, after one conversation with the new professor, she was hooked.

“Jelena has an infectious enthusiasm for integrated photonics,” she recalls. “After those initial conversations, I took a leap of faith.”

Corsetti joined Notaros’ team as it was just getting started. Closely mentored by a senior student, Milica Notaros, she and her cohort grew immersed in integrated photonics.

Over the years, she’s particularly enjoyed the collaborative and close-knit nature of the lab and how the work involves so many different aspects of the experimental process, from design to simulation to analysis to hardware testing.

“An exciting challenge that we’re always running up against is new chip-fabrication requirements. There is a lot of back-and-forth between new application areas that demand new fabrication technologies, followed by improved fabrication technologies motivating additional application areas. That cycle is constantly pushing the field forward,” she says.

Corsetti plans to stay at the cutting edge of the field after graduation as an integrated-photonics researcher in industry or at a national lab. She would like to focus on trapped-ion quantum computing, which scientists are rapidly scaling up toward commercially viable systems, or other high-performance computing applications.

“You really need accelerated computing for any modern research area. It would be exciting and rewarding to contribute to high-performance computing that can enable a lot of other interesting research areas,” she says.

Paying it forward

In addition to making an impact with research, Corsetti is focused on making a personal impact in the lives of others. Through her involvement in MIT Graduate Hillel, she joined the Jewish Big Brothers Big Sisters of Boston, where she volunteers for the friend-to-friend program.

Participating in the program, which pairs adults who have disabilities with friends in the community for fun activities like watching movies or painting has been an especially uplifting and gratifying experience for Corsetti.

She’s also enjoyed the opportunity to support, mentor, and bond with her fellow MIT EECS students, drawing on the advice she’s received throughout her own academic journey.

“Don’t trust feelings of imposter syndrome,” she advises others. “Keep moving forward, ask for feedback and help, and be confident that you will reach a point where you can make meaningful contributions to a team.”

Outside the lab, she enjoys playing classical music on the clarinet (her favorite piece is Leonard Bernstein’s famous overture to “Candide”), reading, and caring for a family of fish in her aquarium.