Training LLMs to self-detoxify their language

As we mature from childhood, our vocabulary — as well as the ways we use it — grows, and our experiences become richer, allowing us to think, reason, and interact with others with specificity and intention. Accordingly, our word choices evolve to align with our personal values, ethics, cultural norms, and views. Over time, most of us develop an internal “guide” that enables us to learn context behind conversation; it also frequently directs us away from sharing information and sentiments that are, or could be, harmful or inappropriate. As it turns out, large language models (LLMs) — which are trained on extensive, public datasets and therefore often have biases and toxic language baked in — can gain a similar capacity to moderate their own language.

A new method from MIT, the MIT-IBM Watson AI Lab, and IBM Research, called self-disciplined autoregressive sampling (SASA), allows LLMs to detoxify their own outputs, without sacrificing fluency. 

Unlike other detoxifying methods, this decoding algorithm learns a boundary between toxic/nontoxic subspaces within the LLM’s own internal representation, without altering the parameters of the model, the need for retraining, or an external reward model. Then, during inference, the algorithm assesses the toxicity value of the partially generated phrase: tokens (words) already generated and accepted, along with each potential new token that could reasonably be chosen for proximity to the classifier boundary. Next, it selects a word option that places the phrase in the nontoxic space, ultimately offering a fast and efficient way to generate less-toxic language.

“We wanted to find out a way with any existing language model [that], during the generation process, the decoding can be subject to some human values; the example here we are taking is toxicity,” says the study’s lead author Ching-Yun “Irene” Ko PhD ’24, a former graduate intern with the MIT-IBM Watson AI Lab and a current research scientist at IBM’s Thomas J. Watson Research Center in New York.

Ko’s co-authors include Luca Daniel, professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and Ko’s graduate advisor; and several members of the MIT-IBM Watson AI Lab and/or IBM Research — Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, and Tejaswini Pedapati. The work will be presented at the International Conference on Learning Representations.

Finding the “guardrails”

The training resources behind LLMs almost always include content collected from public spaces like the internet and other readily available datasets. As such, curse words and bullying/unpalatable language is a component, although some of it is in the context of literary works. It then follows that LLMs can innately produce — or be tricked into generating — dangerous and/or biased content, which often contains disagreeable words or hateful language, even from innocuous prompts. Further, it’s been found that they can learn and amplify language that’s not preferred or even detrimental for many applications and downstream tasks — leading to the need for mitigation or correction strategies.

There are many ways to achieve robust language generation that’s fair and value-aligned. Some methods use LLM retraining with a sanitized dataset, which is costly, takes time, and may alter the LLM’s performance; others employ decoding external reward models, like sampling or beam search, which take longer to run and require more memory. In the case of SASA, Ko, Daniel, and the IBM Research team developed a method that leverages the autoregressive nature of LLMs, and using a decoding-based strategy during the LLM’s inference, gradually steers the generation — one token at a time — away from unsavory or undesired outputs and toward better language.

The research group achieved this by building a linear classifier that operates on the learned subspace from the LLM’s embedding. When LLMs are trained, words with similar meanings are placed closely together in vector space and further away from dissimilar words; the researchers hypothesized that an LLM’s embedding would therefore also capture contextual information, which could be used for detoxification. The researchers used datasets that contained sets of a prompt (first half of a sentence or thought), a response (the completion of that sentence), and human-attributed annotation, like toxic or nontoxic, preferred or not preferred, with continuous labels from 0-1, denoting increasing toxicity. A Bayes-optimal classifier was then applied to learn and figuratively draw a line between the binary subspaces within the sentence embeddings, represented by positive values (nontoxic space) and negative numbers (toxic space). 

The SASA system then works by re-weighting the sampling probabilities of newest potential token based on the value of it and the generated phrase’s distance to the classifier, with the goal of remaining close to the original sampling distribution.

To illustrate, if a user is generating a potential token #12 in a sentence, the LLM will look over its full vocabulary for a reasonable word, based on the 11 words that came before it, and using top-k, top-p, it will filter and produce roughly 10 tokens to select from. SASA then evaluates each of those tokens in the partially completed sentence for its proximity to the classifier (i.e., the value of tokens 1-11, plus each potential token 12). Tokens that produce sentences in the positive space are encouraged, while those in the negative space are penalized. Additionally, the further away from the classifier, the stronger the impact.

“The goal is to change the autoregressive sampling process by re-weighting the probability of good tokens. If the next token is likely to be toxic given the context, then we are going to reduce the sampling probability for those prone to be toxic tokens,” says Ko. The researchers chose to do it this way “because the things we say, whether it’s benign or not, is subject to the context.”

Tamping down toxicity for value matching

The researchers evaluated their method against several baseline interventions with three LLMs of increasing size; all were transformers and autoregressive-based: GPT2-Large, Llama2-7b, and Llama 3.1-8b-Instruct, with 762 million, 7 billion, and 8 billion parameters respectively. For each prompt, the LLM was tasked with completing the sentence/phrase 25 times, and PerspectiveAPI scored them from 0 to 1, with anything over 0.5 being toxic. The team looked at two metrics: the average maximum toxicity score over the 25 generations for all the prompts, and the toxic rate, which was the probability of producing at least one toxic phrase over 25 generations. Reduced fluency (and therefore increased perplexity) were also analyzed. SASA was tested to complete RealToxicityPrompts (RPT), BOLD, and AttaQ datasets, which contained naturally occurring, English sentence prompts.

The researchers ramped up the complexity of their trials for detoxification by SASA, beginning with nontoxic prompts from the RPT dataset, looking for harmful sentence completions. Then, they escalated it to more challenging prompts from RPT that were more likely to produce concerning results, and as well applied SASA to the instruction-tuned model to assess if their technique could further reduce unwanted ouputs. They also used the BOLD and AttaQ benchmarks to examine the general applicability of SASA in detoxification. With the BOLD dataset, the researchers further looked for gender bias in language generations and tried to achieve a balanced toxic rate between the genders. Lastly, the team looked at runtime, memory usage, and how SASA could be combined with word filtering to achieve healthy and/or helpful language generation.

“If we think about how human beings think and react in the world, we do see bad things, so it’s not about allowing the language model to see only the good things. It’s about understanding the full spectrum — both good and bad,” says Ko, “and choosing to uphold our values when we speak and act.”

Overall, SASA achieved significant toxic language generation reductions, performing on par with RAD, a state-of-the-art external reward model technique. However, it was universally observed that stronger detoxification accompanied a decrease in fluency. Before intervention, the LLMs produced more toxic responses for female labeled prompts than male; however, SASA was able to also significantly cut down harmful responses, making them more equalized. Similarly, word filtering on top of SASA did markedly lower toxicity levels, but it also hindered the ability of the LLM to respond coherently.

A great aspect of this work is that it’s a well-defined, constrained optimization problem, says Ko, meaning that balance between open language generation that sounds natural and the need to reduce unwanted language can be achieved and tuned.

Further, Ko says, SASA could work well for multiple attributes in the future: “For human beings, we have multiple human values. We don’t want to say toxic things, but we also want to be truthful, helpful, and loyal … If you were to fine-tune a model for all of these values, it would require more computational resources and, of course, additional training.” On account of the lightweight manner of SASA, it could easily be applied in these circumstances: “If you want to work with multiple values, it’s simply checking the generation’s position in multiple subspaces. It only adds marginal overhead in terms of the compute and parameters,” says Ko, leading to more positive, fair, and principle-aligned language.

This work was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation.

Could LLMs help design our next medicines and materials?

The process of discovering molecules that have the properties needed to create new medicines and materials is cumbersome and expensive, consuming vast computational resources and months of human labor to narrow down the enormous space of potential candidates.

Large language models (LLMs) like ChatGPT could streamline this process, but enabling an LLM to understand and reason about the atoms and bonds that form a molecule, the same way it does with words that form sentences, has presented a scientific stumbling block.

Researchers from MIT and the MIT-IBM Watson AI Lab created a promising approach that augments an LLM with other machine-learning models known as graph-based models, which are specifically designed for generating and predicting molecular structures.

Their method employs a base LLM to interpret natural language queries specifying desired molecular properties. It automatically switches between the base LLM and graph-based AI modules to design the molecule, explain the rationale, and generate a step-by-step plan to synthesize it. It interleaves text, graph, and synthesis step generation, combining words, graphs, and reactions into a common vocabulary for the LLM to consume.

When compared to existing LLM-based approaches, this multimodal technique generated molecules that better matched user specifications and were more likely to have a valid synthesis plan, improving the success ratio from 5 percent to 35 percent.

It also outperformed LLMs that are more than 10 times its size and that design molecules and synthesis routes only with text-based representations, suggesting multimodality is key to the new system’s success.

“This could hopefully be an end-to-end solution where, from start to finish, we would automate the entire process of designing and making a molecule. If an LLM could just give you the answer in a few seconds, it would be a huge time-saver for pharmaceutical companies,” says Michael Sun, an MIT graduate student and co-author of a paper on this technique.

Sun’s co-authors include lead author Gang Liu, a graduate student at the University of Notre Dame; Wojciech Matusik, a professor of electrical engineering and computer science at MIT who leads the Computational Design and Fabrication Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL); Meng Jiang, associate professor at the University of Notre Dame; and senior author Jie Chen, a senior research scientist and manager in the MIT-IBM Watson AI Lab. The research will be presented at the International Conference on Learning Representations.

Best of both worlds

Large language models aren’t built to understand the nuances of chemistry, which is one reason they struggle with inverse molecular design, a process of identifying molecular structures that have certain functions or properties.

LLMs convert text into representations called tokens, which they use to sequentially predict the next word in a sentence. But molecules are “graph structures,” composed of atoms and bonds with no particular ordering, making them difficult to encode as sequential text.

On the other hand, powerful graph-based AI models represent atoms and molecular bonds as interconnected nodes and edges in a graph. While these models are popular for inverse molecular design, they require complex inputs, can’t understand natural language, and yield results that can be difficult to interpret.

The MIT researchers combined an LLM with graph-based AI models into a unified framework that gets the best of both worlds.

Llamole, which stands for large language model for molecular discovery, uses a base LLM as a gatekeeper to understand a user’s query — a plain-language request for a molecule with certain properties.

For instance, perhaps a user seeks a molecule that can penetrate the blood-brain barrier and inhibit HIV, given that it has a molecular weight of 209 and certain bond characteristics.

As the LLM predicts text in response to the query, it switches between graph modules.

One module uses a graph diffusion model to generate the molecular structure conditioned on input requirements. A second module uses a graph neural network to encode the generated molecular structure back into tokens for the LLMs to consume. The final graph module is a graph reaction predictor which takes as input an intermediate molecular structure and predicts a reaction step, searching for the exact set of steps to make the molecule from basic building blocks.

The researchers created a new type of trigger token that tells the LLM when to activate each module. When the LLM predicts a “design” trigger token, it switches to the module that sketches a molecular structure, and when it predicts a “retro” trigger token, it switches to the retrosynthetic planning module that predicts the next reaction step.

“The beauty of this is that everything the LLM generates before activating a particular module gets fed into that module itself. The module is learning to operate in a way that is consistent with what came before,” Sun says.

In the same manner, the output of each module is encoded and fed back into the generation process of the LLM, so it understands what each module did and will continue predicting tokens based on those data.

Better, simpler molecular structures

In the end, Llamole outputs an image of the molecular structure, a textual description of the molecule, and a step-by-step synthesis plan that provides the details of how to make it, down to individual chemical reactions.

In experiments involving designing molecules that matched user specifications, Llamole outperformed 10 standard LLMs, four fine-tuned LLMs, and a state-of-the-art domain-specific method. At the same time, it boosted the retrosynthetic planning success rate from 5 percent to 35 percent by generating molecules that are higher-quality, which means they had simpler structures and lower-cost building blocks.

“On their own, LLMs struggle to figure out how to synthesize molecules because it requires a lot of multistep planning. Our method can generate better molecular structures that are also easier to synthesize,” Liu says.

To train and evaluate Llamole, the researchers built two datasets from scratch since existing datasets of molecular structures didn’t contain enough details. They augmented hundreds of thousands of patented molecules with AI-generated natural language descriptions and customized description templates.

The dataset they built to fine-tune the LLM includes templates related to 10 molecular properties, so one limitation of Llamole is that it is trained to design molecules considering only those 10 numerical properties.

In future work, the researchers want to generalize Llamole so it can incorporate any molecular property. In addition, they plan to improve the graph modules to boost Llamole’s retrosynthesis success rate.

And in the long run, they hope to use this approach to go beyond molecules, creating multimodal LLMs that can handle other types of graph-based data, such as interconnected sensors in a power grid or transactions in a financial market.

“Llamole demonstrates the feasibility of using large language models as an interface to complex data beyond textual description, and we anticipate them to be a foundation that interacts with other AI algorithms to solve any graph problems,” says Chen.

This research is funded, in part, by the MIT-IBM Watson AI Lab, the National Science Foundation, and the Office of Naval Research.

Hopping gives this tiny robot a leg up

Insect-scale robots can squeeze into places their larger counterparts can’t, like deep into a collapsed building to search for survivors after an earthquake.

However, as they move through the rubble, tiny crawling robots might encounter tall obstacles they can’t climb over or slanted surfaces they will slide down. While aerial robots could avoid these hazards, the amount of energy required for flight would severely limit how far the robot can travel into the wreckage before it needs to return to base and recharge.

To get the best of both locomotion methods, MIT researchers developed a hopping robot that can leap over tall obstacles and jump across slanted or uneven surfaces, while using far less energy than an aerial robot.

The hopping robot, which is smaller than a human thumb and weighs less than a paperclip, has a springy leg that propels it off the ground, and four flapping-wing modules that give it lift and control its orientation.

The robot can jump about 20 centimeters into the air, or four times its height, at a lateral speed of about 30 centimeters per second, and has no trouble hopping across ice, wet surfaces, and uneven soil, or even onto a hovering drone. All the while, the hopping robot consumes about 60 percent less energy than its flying cousin.

Due to its light weight and durability, and the energy efficiency of the hopping process, the robot could carry about 10 times more payload than a similar-sized aerial robot, opening the door to many new applications.

“Being able to put batteries, circuits, and sensors on board has become much more feasible with a hopping robot than a flying one. Our hope is that one day this robot could go out of the lab and be useful in real-world scenarios,” says Yi-Hsuan (Nemo) Hsiao, an MIT graduate student and co-lead author of a paper on the hopping robot.

Hsiao is joined on the paper by co-lead authors Songnan Bai, a research assistant professor at The University of Hong Kong; and Zhongtao Guan, an incoming MIT graduate student who completed this work as a visiting undergraduate; as well as Suhan Kim and Zhijian Ren of MIT; and senior authors Pakpong Chirarattananon, an associate professor of the City University of Hong Kong; and Kevin Chen, an associate professor in the MIT Department of Electrical Engineering and Computer Science and head of the Soft and Micro Robotics Laboratory within the Research Laboratory of Electronics. The research appears today in Science Advances.

The hopping robot, which is smaller than a human thumb and weighs less than a paperclip, has a springy leg that propels it off the ground, and four flapping-wing modules that give it lift and control its orientation.
Image credit: Courtesy of the researchers

Maximizing efficiency

Jumping is common among insects, from fleas that leap onto new hosts to grasshoppers that bound around a meadow. While jumping is less common among insect-scale robots, which usually fly or crawl, hopping affords many advantages for energy efficiency.

When a robot hops, it transforms potential energy, which comes from its height off the ground, into kinetic energy as it falls. This kinetic energy transforms back to potential energy when it hits the ground, then back to kinetic as it rises, and so on.

To maximize efficiency of this process, the MIT robot is fitted with an elastic leg made from a compression spring, which is akin to the spring on a click-top pen. This spring converts the robot’s downward velocity to upward velocity when it strikes the ground.

“If you have an ideal spring, your robot can just hop along without losing any energy. But since our spring is not quite ideal, we use the flapping modules to compensate for the small amount of energy it loses when it makes contact with the ground,” Hsiao explains.

As the robot bounces back up into the air, the flapping wings provide lift, while ensuring the robot remains upright and has the correct orientation for its next jump. Its four flapping-wing mechanisms are powered by soft actuators, or artificial muscles, that are durable enough to endure repeated impacts with the ground without being damaged.

“We have been using the same robot for this entire series of experiments, and we never needed to stop and fix it,” Hsiao adds.

Key to the robot’s performance is a fast control mechanism that determines how the robot should be oriented for its next jump. Sensing is performed using an external motion-tracking system, and an observer algorithm computes the necessary control information using sensor measurements.

As the robot hops, it follows a ballistic trajectory, arcing through the air. At the peak of that trajectory, it estimates its landing position. Then, based on its target landing point, the controller calculates the desired takeoff velocity for the next jump. While airborne, the robot flaps its wings to adjust its orientation so it strikes the ground with the correct angle and axis to move in the proper direction and at the right speed.

Durability and flexibility

The researchers put the hopping robot, and its control mechanism, to the test on a variety of surfaces, including grass, ice, wet glass, and uneven soil — it successfully traversed all surfaces. The robot could even hop on a surface that was dynamically tilting.

“The robot doesn’t really care about the angle of the surface it is landing on. As long as it doesn’t slip when it strikes the ground, it will be fine,” Hsiao says.

Since the controller can handle multiple terrains, the robot can easily transition from one surface to another without missing a beat.

For instance, hopping across grass requires more thrust than hopping across glass, since blades of grass cause a damping effect that reduces its jump height. The controller can pump more energy to the robot’s wings during its aerial phase to compensate.

Due to its small size and light weight, the robot has an even smaller moment of inertia, which makes it more agile than a larger robot and better able to withstand collisions.

The researchers showcased its agility by demonstrating acrobatic flips. The featherweight robot could also hop onto an airborne drone without damaging either device, which could be useful in collaborative tasks.

In addition, while the team demonstrated a hopping robot that carried twice its weight, the maximum payload may be much higher. Adding more weight doesn’t hurt the robot’s efficiency. Rather, the efficiency of the spring is the most significant factor that limits how much the robot can carry.

Moving forward, the researchers plan to leverage its ability to carry heavy loads by installing batteries, sensors, and other circuits onto the robot, in the hopes of enabling it to hop autonomously outside the lab.

“Multimodal robots (those combining multiple movement strategies) are generally challenging and particularly impressive at such a tiny scale. The versatility of this tiny multimodal robot — flipping, jumping on rough or moving terrain, and even another robot — makes it even more impressive,” says Justin Yim, assistant professor at the University of Illinois at Urbana-Champagne, who was not involved with this work. “Continuous hopping shown in this research enables agile and efficient locomotion in environments with many large obstacles.”

This research is funded, in part, by the U.S. National Science Foundation and the MIT MISTI program. Chirarattananon was supported by the Research Grants Council of the Hong Kong Special Administrative Region of China. Hsiao is supported by a MathWorks Fellowship, and Kim is supported by a Zakhartchenko Fellowship.

New method efficiently safeguards sensitive AI training data

Data privacy comes with a cost. There are security techniques that protect sensitive user data, like customer addresses, from attackers who may attempt to extract them from AI models — but they often make those models less accurate.

MIT researchers recently developed a framework, based on a new privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data, such as medical images or financial records, remain safe from attackers. Now, they’ve taken this work a step further by making their technique more computationally efficient, improving the tradeoff between accuracy and privacy, and creating a formal template that can be used to privatize virtually any algorithm without needing access to that algorithm’s inner workings.

The team utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks.

They also demonstrated that more “stable” algorithms are easier to privatize with their method. A stable algorithm’s predictions remain consistent even when its training data are slightly modified. Greater stability helps an algorithm make more accurate predictions on previously unseen data.

The researchers say the increased efficiency of the new PAC Privacy framework, and the four-step template one can follow to implement it, would make the technique easier to deploy in real-world situations.

“We tend to consider robustness and privacy as unrelated to, or perhaps even in conflict with, constructing a high-performance algorithm. First, we make a working algorithm, then we make it robust, and then private. We’ve shown that is not always the right framing. If you make your algorithm perform better in a variety of settings, you can essentially get privacy for free,” says Mayuri Sridhar, an MIT graduate student and lead author of a paper on this privacy framework.

She is joined in the paper by Hanshen Xiao PhD ’24, who will start as an assistant professor at Purdue University in the fall; and senior author Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering at MIT. The research will be presented at the IEEE Symposium on Security and Privacy.

Estimating noise

To protect sensitive data that were used to train an AI model, engineers often add noise, or generic randomness, to the model so it becomes harder for an adversary to guess the original training data. This noise reduces a model’s accuracy, so the less noise one can add, the better.

PAC Privacy automatically estimates the smallest amount of noise one needs to add to an algorithm to achieve a desired level of privacy.

The original PAC Privacy algorithm runs a user’s AI model many times on different samples of a dataset. It measures the variance as well as correlations among these many outputs and uses this information to estimate how much noise needs to be added to protect the data.

This new variant of PAC Privacy works the same way but does not need to represent the entire matrix of data correlations across the outputs; it just needs the output variances.

“Because the thing you are estimating is much, much smaller than the entire covariance matrix, you can do it much, much faster,” Sridhar explains. This means that one can scale up to much larger datasets.

Adding noise can hurt the utility of the results, and it is important to minimize utility loss. Due to computational cost, the original PAC Privacy algorithm was limited to adding isotropic noise, which is added uniformly in all directions. Because the new variant estimates anisotropic noise, which is tailored to specific characteristics of the training data, a user could add less overall noise to achieve the same level of privacy, boosting the accuracy of the privatized algorithm.

Privacy and stability

As she studied PAC Privacy, Sridhar hypothesized that more stable algorithms would be easier to privatize with this technique. She used the more efficient variant of PAC Privacy to test this theory on several classical algorithms.

Algorithms that are more stable have less variance in their outputs when their training data change slightly. PAC Privacy breaks a dataset into chunks, runs the algorithm on each chunk of data, and measures the variance among outputs. The greater the variance, the more noise must be added to privatize the algorithm.

Employing stability techniques to decrease the variance in an algorithm’s outputs would also reduce the amount of noise that needs to be added to privatize it, she explains.

“In the best cases, we can get these win-win scenarios,” she says.

The team showed that these privacy guarantees remained strong despite the algorithm they tested, and that the new variant of PAC Privacy required an order of magnitude fewer trials to estimate the noise. They also tested the method in attack simulations, demonstrating that its privacy guarantees could withstand state-of-the-art attacks.

“We want to explore how algorithms could be co-designed with PAC Privacy, so the algorithm is more stable, secure, and robust from the beginning,” Devadas says. The researchers also want to test their method with more complex algorithms and further explore the privacy-utility tradeoff.

“The question now is: When do these win-win situations happen, and how can we make them happen more often?” Sridhar says.

“I think the key advantage PAC Privacy has in this setting over other privacy definitions is that it is a black box — you don’t need to manually analyze each individual query to privatize the results. It can be done completely automatically. We are actively building a PAC-enabled database by extending existing SQL engines to support practical, automated, and efficient private data analytics,” says Xiangyao Yu, an assistant professor in the computer sciences department at the University of Wisconsin at Madison, who was not involved with this study.

This research is supported, in part, by Cisco Systems, Capital One, the U.S. Department of Defense, and a MathWorks Fellowship.

Student Spotlight: YongYan (Crystal) Liang

This interview is part of a series of short interviews from the Department of EECS, called Student Spotlights. Each Spotlight features a student answering their choice of questions about themselves and life at MIT. Today’s interviewee, YongYan (Crystal) Liang, is a senior majoring in 6-2, Electrical Engineering and Computer Science. Liang has a particular interest in bioengineering and medical devices, which lead her to join the Living Machines track as part of NEET. A SuperUROP scholar, Liang was supported by the Nadar Foundation Undergraduate Research and Innovation Scholar award for her project, which focused on steering systems for intravascular drug delivery devices. A world traveler, Liang has also taught robotics to students in MISTI GTL (Global Teaching Labs) programs in Korea and Germany–and is involved with the Terrascope and Medlinks communities. She took time out of her busy schedule to answer a selection of questions about her experiences at MIT!

Do you have a bucket list? If so, share one or two of the items on it!

I’d like to be proficient in at least five languages in a conversational sense (though probably not at a working proficiency level). Currently, I’m fluent in English and can speak Cantonese and Mandarin. I also have a 1600+ day Duolingo streak where I’m trying to learn the foundations of a few languages, including German, Korean, Japanese, and Russian.

Liang in Genoa, Italy.

Another bucket list item I have is to try every martial art/combat sport there is, even if it’s just an introduction class. So far, I’ve practiced Taekwondo for a few years, taken a few lessons in Boxing/Kickboxing, and dabbled in beginners’ classes for Karate, Krav Maga, and Brazilian Jiujitsu. I’ll probably try to take Judo, Aikido, and other classes this upcoming year! It would also be pretty epic to be a 4th dan black belt one day, though that may take a decade or two…

Liang in Pisa, Italy.

If you had to teach a really in-depth class about one niche topic, what would you pick?

Personally, I think artificial organs are pretty awesome! I would probably talk about the fusion of engineering with our bodies, and organ enhancement. This might include adding functionalities and possible organ regeneration, so that those waiting for organ donations can be helped without being morally conflicted by waiting for another person’s downfall. I’ve previously done research in several BioEECS related labs that I’d love to talk about as well. This includes the Traverso lab at Pappalardo, briefly in the Edelman lab at the IMES (Institute for Medical Engineering and Science), the Langer Lab at the Koch Institute of Integrative Cancer Research, as well as in the MIT Media Lab with the Conformable Decoders and BioMechatronics group! I also contributed to a recently published paper related to gastrointestinal devices: OSIRIS.  

If you suddenly won the lottery, what would you spend some of the money on?

I would make sure my mom got most of the money. The first thing we’d do is probably go house shopping around the world and buy properties in great travel destinations–then go around and live in said properties. We would do this on rotation with our friends until we ran out of money, then put the properties up for rent and use the money to open a restaurant with my mom’s recipes as the menu. Then I’d get to eat her food forever 🙂

Liang shares a special moment with her mom in front of the Great Dome.

What do you believe is an underrated invention or technology? Why’s it so important?

I feel like many people wear glasses or put on contacts nowadays and don’t really think twice about it, glossing over how cool it is that we can fix bad sight and how critical sight is for our survival. If a zombie apocalypse happened and my glasses broke, it would be over for me 🙁 And don’t get me started about the invention of the indoor toilet and plumbing systems…

Are you a re-reader or a re-watcher—and if so, what are your comfort books, shows, or movies?

I’m both a re-reader and a re-watcher! I have a lot of fun binging webtoons and dramas. I’m also a huge Marvel fan, although recently, it’s been a hit or miss. Action and rom coms are my kinda vibes and occasionally I do watch some anime. If I’m bored I usually rewatch some MCU movies or Fairy Tail or read some Isekai genre stories. 

Crystal hangs out with Iron Man and the Hulk in Changwon, Korea.

It’s time to get on the shuttle to the first Mars colony, and you can only bring one personal item. What are you going to bring along with you?

My first thought was my phone, but I feel like that may be too standard of an answer. If we were talking about the fantasy realm, I might ask Stephen Strange to borrow his sling ring to open more portals to link the Earth and Mars. As to why he wouldn’t have just come with us in the first place, I don’t know, maybe he’s too busy fighting aliens or something?

What are you looking forward to about life after graduation? What do you think you’ll miss about MIT?

I won’t be missing dining hall food very much, that’s for sure. (Except for the amazing oatmeal from one of the Maseeh dining hall chefs, Sum!) I am, however, excited to live the 9-5 life for a few years and have my weekends back. I’ll miss my friends dearly since everyone will be so spread out across the States and abroad. I’ll miss the nights we spent watching movies, playing games, cooking, eating and yapping away. I’m excited to see everyone grow and take another step closer to their dreams. It will be fun visiting them and being able to explore the world at the same time ! For more immediate plans, I’ll be going back to Apple this summer to intern again and will finish my MEng with the 6A program at Cadence. Afterwards, I shall see where life takes me!

Liang in Berlin, Germany.

“Biomedical lab in a box” empowers engineers in low- and middle-income countries

Globally, and especially in low- and middle-income countries (LMICs), a significant portion of the population lacks access to essential healthcare services. Although there are many contributing factors that create barriers to access, in many LMICs failing or obsolete equipment plays a significant role.

“Those of us who have investigated healthcare systems in LMICs are familiar with so-called ‘equipment graveyards,’” says Nevan Hanumara, SM ‘06, PhD ‘12, a research scientist in MIT’s Department of Mechanical Engineering, referencing piles of broken, imported equipment, often bearing stickers indicating their origins from donor organizations.

“Looking at the root causes of medical equipment failing and falling out of service in LMICs, we find that the local biomedical engineers truly can’t do the maintenance, due to a cascade of challenges,” he says.

Among these challenges are: design weaknesses – systems designed for temperate, air-conditioned hospitals and stabilized power don’t fare well in areas with inconsistent power supply, dust, high heat and humidity, and continuous utilization; lack of supply chain – parts ordered in the US can arrive in days, where parts ordered to East Africa may take months; and limited access to knowledgeable professionals – outside of major metropolitan areas, biomedical engineers are scarce.

Hanumara, Leroy Sibanda SM’24, a recent graduate with a dual degree in Management and Electrical Engineering and Computer Science (EECS), and Anthony Pennes SB ’16, a technical instructor in EECS, began to ponder what could be changed if local biomedical engineers were actually involved with the design of the equipment that they’re charged with maintaining.

Pennes, who staffs 2.75/6.4861 (Medical Device Design), among other courses, developed hands-on biosensing and mechatronics exercises as class activities several years ago. Hanumara became interested in expanding that curriculum to produce something that could have a larger impact.

Working as a team, and with support from MIT International Science and Technology Initiatives (MISTI), MIT Jameel World Education Lab (J-WEL), and the Priscilla King Gray (PKG) Public Service Center, the trio created a hands-on course, exercises, and curriculum, supported by what they’ve now dubbed a “Biomed Lab in a Box” kit.

Sibanda, who hails from Bulawayo, Zimbabwe, brings additional lived experience to the project. He says friends up and down the continent speak about great practical primary and secondary education, and a tertiary education that provides a heavy emphasis on theory. The consequence, he says, is a plethora of graduates who are absolutely brilliant at the theory, but less experienced in advanced practical concepts.

“Anyone who has ever had to build systems that need to stand up to real world conditions understands the chasm between knowing how to calculate the theoretically perfect ‘x’ and being capable of implementing a real-world solution with the materials available,” says Sibanda.

Hanumara and Sibanda travelled to Nairobi, Kenya, and Mbarara, Uganda, in late 2024 to test their kit and their theory, teaching three-day long biomedical innovation mini-courses at both Kenyatta University and Mbarara University of Science & Technology (MUST), with Pennes providing remote support from MIT’s campus.

With a curriculum based off of 2.75, labs were designed to connect the theoretical to the physical, increasing in complexity and confronting students with the real challenges of biomedical hardware and sensing, such as weak signals, ambient noise, motion artifacts, debugging, and precision assembly.

Pennes says the goal for the mini-courses was to shape the project around the real-world experiences of the region’s biomedical engineering students. “One of the problems that they experience in this region is not simply a lack of equipment, but the lack of ability to maintain it,” he says. “Some organization will come in and donate thousands of dollars of surgical lighting; then a power supply will burn out, and the organization will never come back to fix it.”

But that’s just the beginning of the problem, he adds. Engineers often find that the design isn’t open, and there’s no manual, making it impossible to find a circuit design for what’s inside the donated, proprietary system. “You have to poke and prod around the disassembled gear to see if you can discern the makers’ original goals in wiring it, and figure out a fix,” says Pennes.

In one example, he recalls seeing a donated screen for viewing x-rays – the lightbox kind used to backlight film so that technicians can read the image – with a burned-out bulb. “The screen is lit by a proprietary bulb, so when it burned out, they could not replace it,” he recounts.

Local biomedical engineers ultimately realized that they could take a number of off-the-shelf fluorescent bulbs and angle them to fit inside the box. “Then they sort of MacGuyver’d the wiring to make them all work. You get the medical technology to work however you can.”

It’s this hands-on, imaginative approach to problem-solving that the team hopes to promote – and it’s one that’s very familiar at MIT. “We’re not just ideas people, where we write a paper and we’re done with it – we want to see it applied,” says Hanumara. “It’s why so many start-ups come out of MIT.”

Course modules presented at Kenyatta and MUST included “Breadboarding an optical LED – photodetector pulse detector,” “Soldering a PCB and testing a 3-lead EKG,” and “Assembling and programming a syringe pump.” Each module is designed to be a self-contained learning experience, and the kit is accompanied by a USB flash drive with a 96-page lab manual written by Sibanda, and all the needed software, which is important to have when internet access is unreliable. The third exercise, relating to the syringe pump, is already available via open access from the journal Biomedical Engineering Education.

“Our mission was to expose eager, young biomedical engineers to the hands-on, Mens et Manus culture which is the cornerstone of MIT, and encourage them to develop their talents and aspirations as engineers and innovators,” says Hanumara. “We wanted to help empower them to participate in developing high quality, contextually appropriate, technologies that improve healthcare delivery in their own region.”

LinkedIn post written by Hanumara, shared reflections from students on their experiences with the material. “Every lab—from pulse oximetry and EKGs to syringe pump prototyping—brought classroom concepts to life, showing me the real-world applications of what we study,” wrote Muthoni Muriithi, a student at Kenyatta University. “Using breadboards, coding microcontrollers, soldering components, and analyzing biological data in real-time helped me grasp how much careful design and precision go into creating reliable healthcare tools.”

Feedback provided by students at both institutions is already helping to inform updates to the materials and future pilot programs.

Sibanda says another key thing the team is tracking what happens beyond the sessions, after the instructors leave. “It’s not just about offering the resource,” he says. “It’s important to understand what students find to be the most valuable, especially on their own.”

Hanumara concurs. “[Pennes] designed the core board that we’re using to be multi-functional. We didn’t touch any of the functions he built in – we want to see what the students will do with them. We also want to see what they can do with the mental framework,” he says, adding that this approach is important to empower students to explore, invent, and eventually scale up their own ideas.

Further, the project addresses another challenge the team identified early on: supply chain issues. In keeping with the mission of local capacity building, the entire kit was assembled in Nairobi by Gearbox Europlacer, which operates the only automated circuit board line in East Africa and is licensed to produce Raspberry Pi’s microcontrollers. “We did not tell the students anything,” says Hanumara, “but left it to them to notice that their circuit boards and microcontrollers said ‘Made in Kenya.’”

“The insistence on local manufacturing keeps us from falling into the trap that so much equipment donated into East Africa creates – you have one of these items, and if some part of it breaks you can never replace it,” says Pennes. “Having locally-sourced items instead means that if you need another component, or devise an interesting side project, you have a shopping list and you can go get whatever you need.”

Building off our ‘Biomed Lab in a Box’ experiment,” says Hanumara, “we aim to work with our colleagues in East Africa to further explore what can be designed and built with the eager, young talent and capabilities in the region.”

Hanumara’s LinkedIn post also thanked collaborators, Professor June Madete and Dean Johnes Obungoloch (PhD), from Kenyatta and MUST, respectively, and Latiff Cherono, managing director of Gearbox. The team hopes to eventually release the whole course in open-source format.

A new way to make graphs more accessible to blind and low-vision readers

Bar graphs and other charts provide a simple way to communicate data, but are, by definition, difficult to translate for readers who are blind or low-vision. Designers have developed methods for converting these visuals into “tactile charts,” but guidelines for doing so are extensive (for example, the Braille Authority of North America’s 2022 guidebook is 426 pages long). The process also requires understanding different types of software, as designers often draft their chart in programs like Adobe Illustrator and then translate it into Braille using another application.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have now developed an approach that streamlines the design process for tactile chart designers. Their program, called “Tactile Vega-Lite,” can take data from something like an Excel spreadsheet and turn it into both a standard visual chart and a touch-based one. Design standards are hardwired as default rules within the program to help educators and designers automatically create accessible tactile charts.

The tool could make it easier for blind and low-vision readers to understand many graphics, such as a bar chart comparing minimum wages across states or a line graph tracking countries’ GDPs over time. To bring your designs to the real world, you can tweak your chart in Tactile Vega-Lite and then send its file to a Braille embosser (which prints text as readable dots).

This spring, the researchers will present Tactile Vega-Lite in a paper at the Association of Computing Machinery Conference on Human Factors in Computing Systems. According to lead author Mengzhu “Katie” Chen SM ’25, the tool strikes a balance between the precision that design professionals want for editing and the efficiency educators need to create tactile charts quickly.

“We interviewed teachers who wanted to make their lessons accessible to blind and low-vision students, and designers experienced in putting together tactile charts,” says Chen, a recent CSAIL affiliate and master’s graduate in electrical engineering and computer science and the Program in System Design and Management. “Since their needs differ, we designed a program that’s easy to use, provides instant feedback when you want to make tweaks, and implements accessibility guidelines.”

Data you can feel

The researchers’ program builds off of their 2017 visualization tool Vega-Lite by automatically encoding both a flat, standard chart and a tactile one. Senior author and MIT postdoc Jonathan Zong SM ’20, PhD ’24 points out that the program makes intuitive design decisions so users don’t have to.

“Tactile Vega-Lite has smart defaults to ensure proper spacing, layout, and texture and Braille conversion, following best practices to create good touch-based reading experiences,” says Zong, who is also a fellow at the Berkman Klein Center for Internet and Society at Harvard University and an incoming assistant professor at the University of Colorado. “Building on existing guidelines and our interviews with experts, the goal is for teachers or visual designers without a lot of tactile design expertise to quickly convey data in a clear way for tactile readers to explore and understand.”

Tactile Vega-Lite’s code editor allows users to customize axis labels, tick marks, and other elements. Different features within the chart are represented by abstractions — or summaries of a longer body of code — that can be modified. These shortcuts allow you to write brief phrases that tweak the design of your chart. For example, if you want to change how the bars in your graph are filled out, you could change the code in the “Texture” section from “dottedFill” to “verticalFill” to replace small circles with upward lines.

To understand how these abstractions work, the researchers added a gallery of examples. Each one includes a phrase and what change that code leads to. Still, the team is looking to refine Tactile Vega-Lite’s user interface to make it more accessible to users less familiar with coding. Instead of using abstractions for edits, you could click on different buttons.

Chen says she and her colleagues are hoping to add machine-specific customizations to their program. This would allow users to preview how their tactile chart would look before it’s fabricated by an embossing machine and make edits according to the device’s specifications.

While Tactile Vega-Lite can streamline the many steps it usually takes to make a tactile chart, Zong emphasizes that it doesn’t replace an expert doing a final check-over for guideline compliance. The researchers are continuing to incorporate Braille design rules into their program, but caution that human review will likely remain the best practice.

“The ability to design tactile graphics efficiently, particularly without specialized software, is important for providing equal access of information to tactile readers,” says Stacy Fontenot, owner of Font to Dot, who wasn’t involved in the research. “Graphics that follow current guidelines and standards are beneficial for the reader as consistency is paramount, especially with complex, data-filled graphics. Tactile Vega-Lite has a straightforward interface for creating informative tactile graphics quickly and accurately, thereby reducing the design time in providing quality graphics to tactile readers.”

Chen and Zong wrote the paper with Isabella Pineros ’23, MEng ’24 and MIT Associate Professor Arvind Satyanarayan. The researchers’ work was supported by a National Science Foundation grant.

The CSAIL team also incorporated input from Rich Caloggero from MIT’s Disability and Access Services, as well as the Lighthouse for the Blind, which let them observe technical design workflows as part of the project.

Device enables direct communication among multiple quantum processors

Quantum computers have the potential to solve complex problems that would be impossible for the most powerful classical supercomputer to crack.

Just like a classical computer has separate, yet interconnected, components that must work together, such as a memory chip and a CPU on a motherboard, a quantum computer will need to communicate quantum information between multiple processors.

Current architectures used to interconnect superconducting quantum processors are “point-to-point” in connectivity, meaning they require a series of transfers between network nodes, with compounding error rates.

On the way to overcoming these challenges, MIT researchers developed a new interconnect device that can support scalable, “all-to-all” communication, such that all superconducting quantum processors in a network can communication directly with each other.

They created a network of two quantum processors and used their interconnect to send microwave photons back and forth on demand in a user-defined direction. Photons are particles of light that can carry quantum information.

The device includes a superconducting wire, or waveguide, that shuttles photons between processors and can be routed as far as needed. The researchers can couple any number of modules to it, efficiently transmitting information between a scalable network of processors.

They used this interconnect to demonstrate remote entanglement, a type of correlation between quantum processors that are not physically connected. Remote entanglement is a key step toward developing a powerful, distributed network of many quantum processors.

“In the future, a quantum computer will probably need both local and nonlocal interconnects. Local interconnects are natural in arrays of superconducting qubits. Ours allows for more nonlocal connections. We can send photons at different frequencies, times, and in two propagation directions, which gives our network more flexibility and throughput,” says Aziza Almanakly, an electrical engineering and computer science graduate student in the Engineering Quantum Systems group of the Research Laboratory of Electronics (RLE) and lead author of a paper on the interconnect.

Her co-authors include Beatriz Yankelevich, a graduate student in the EQuS Group; senior author William D. Oliver, the Henry Ellis Warren (1894) Professor of Electrical Engineering and Computer Science (EECS) and professor of Physics, director of the Center for Quantum Engineering, and associate director of RLE; and others at MIT and Lincoln Laboratory. The research appears today in Nature Physics.

A scalable architecture

The researchers previously developed a quantum computing module, which enabled them to send information-carrying microwave photons in either direction along a waveguide.

In the new work, they took that architecture a step further by connecting two modules to a waveguide in order to emit photons in a desired direction and then absorb them at the other end.

Each module is composed of four qubits, which serve as an interface between the waveguide carrying the photons and the larger quantum processors.

The qubits coupled to the waveguide emit and absorb photons, which are then transferred to nearby data qubits.

The researchers use a series of microwave pulses to add energy to a qubit, which then emits a photon. Carefully controlling the phase of those pulses enables a quantum interference effect that allows them to emit the photon in either direction along the waveguide. Reversing the pulses in time enables a qubit in another module any arbitrary distance away to absorb the photon.

“Pitching and catching photons enables us to create a ‘quantum interconnect’ between nonlocal quantum processors, and with quantum interconnects comes remote entanglement,” explains Oliver.

“Generating remote entanglement is a crucial step toward building a large-scale quantum processor from smaller-scale modules. Even after that photon is gone, we have a correlation between two distant, or ‘nonlocal,’ qubits. Remote entanglement allows us to take advantage of these correlations and perform parallel operations between two qubits, even though they are no longer connected and may be far apart,” Yankelevich explains.

However, transferring a photon between two modules is not enough to generate remote entanglement. The researchers need to prepare the qubits and the photon so the modules “share” the photon at the end of the protocol.

Generating entanglement

The team did this by halting the photon emission pulses halfway through their duration. In quantum mechanical terms, the photon is both retained and emitted. Classically, one can think that half-a-photon is retained and half is emitted.

Once the receiver module absorbs that “half-photon,” the two modules become entangled.

But as the photon travels, joints, wire bonds, and connections in the waveguide distort the photon and limit the absorption efficiency of the receiving module.

To generate remote entanglement with high enough fidelity, or accuracy, the researchers needed to maximize how often the photon is absorbed at the other end.

“The challenge in this work was shaping the photon appropriately so we could maximize the absorption efficiency,” Almanakly says.

They used a reinforcement learning algorithm to “predistort” the photon. The algorithm optimized the protocol pulses in order to shape the photon for maximal absorption efficiency.

When they implemented this optimized absorption protocol, they were able to show photon absorption efficiency greater than 60 percent.

This absorption efficiency is high enough to prove that the resulting state at the end of the protocol is entangled, a major milestone in this demonstration.

“We can use this architecture to create a network with all-to-all connectivity. This means we can have multiple modules, all along the same bus, and we can create remote entanglement among any pair of our choosing,” Yankelevich says.

In the future, they could improve the absorption efficiency by optimizing the path over which the photons propagate, perhaps by integrating modules in 3D instead of having a superconducting wire connecting separate microwave packages. They could also make the protocol faster so there are fewer chances for errors to accumulate.

“In principle, our remote entanglement generation protocol can also be expanded to other kinds of quantum computers and bigger quantum internet systems,” Almanakly says.

This work was funded, in part, by the U.S. Army Research Office, the AWS Center for Quantum Computing, and the U.S. Air Force Office of Scientific Research. 

AI tool generates high-quality images faster than state-of-the-art approaches

The ability to generate high-quality images quickly is crucial for producing realistic simulated environments that can be used to train self-driving cars to avoid unpredictable hazards, making them safer on real streets.

But the generative artificial intelligence techniques increasingly being used to produce such images have drawbacks. One popular type of model, called a diffusion model, can create stunningly realistic images but is too slow and computationally intensive for many applications. On the other hand, the autoregressive models that power LLMs like ChatGPT are much faster, but they produce poorer-quality images that are often riddled with errors.

Researchers from MIT and NVIDIA developed a new approach that brings together the best of both methods. Their hybrid image-generation tool uses an autoregressive model to quickly capture the big picture and then a small diffusion model to refine the details of the image.

Their tool, known as HART (short for hybrid autoregressive transformer), can generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster.

The generation process consumes fewer computational resources than typical diffusion models, enabling HART to run locally on a commercial laptop or smartphone. A user only needs to enter one natural language prompt into the HART interface to generate an image.

HART could have a wide range of applications, such as helping researchers train robots to complete complex real-world tasks and aiding designers in producing striking scenes for video games.

“If you are painting a landscape, and you just paint the entire canvas once, it might not look very good. But if you paint the big picture and then refine the image with smaller brush strokes, your painting could look a lot better. That is the basic idea with HART,” says Haotian Tang SM ’22, PhD ’25, co-lead author of a new paper on HART.

He is joined by co-lead author Yecheng Wu, an undergraduate student at Tsinghua University; senior author Song Han, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and a distinguished scientist of NVIDIA; as well as others at MIT, Tsinghua University, and NVIDIA. The research will be presented at the International Conference on Learning Representations.

The best of both worlds

Popular diffusion models, such as Stable Diffusion and DALL-E, are known to produce highly detailed images. These models generate images through an iterative process where they predict some amount of random noise on each pixel, subtract the noise, then repeat the process of predicting and “de-noising” multiple times until they generate a new image that is completely free of noise.

Because the diffusion model de-noises all pixels in an image at each step, and there may be 30 or more steps, the process is slow and computationally expensive. But because the model has multiple chances to correct details it got wrong, the images are high-quality.

Autoregressive models, commonly used for predicting text, can generate images by predicting patches of an image sequentially, a few pixels at a time. They can’t go back and correct their mistakes, but the sequential prediction process is much faster than diffusion.

These models use representations known as tokens to make predictions. An autoregressive model utilizes an autoencoder to compress raw image pixels into discrete tokens as well as reconstruct the image from predicted tokens. While this boosts the model’s speed, the information loss that occurs during compression causes errors when the model generates a new image.

With HART, the researchers developed a hybrid approach that uses an autoregressive model to predict compressed, discrete image tokens, then a small diffusion model to predict residual tokens. Residual tokens compensate for the model’s information loss by capturing details left out by discrete tokens.

“We can achieve a huge boost in terms of reconstruction quality. Our residual tokens learn high-frequency details, like edges of an object, or a person’s hair, eyes, or mouth. These are places where discrete tokens can make mistakes,” says Tang.

Because the diffusion model only predicts the remaining details after the autoregressive model has done its job, it can accomplish the task in eight steps, instead of the usual 30 or more a standard diffusion model requires to generate an entire image. This minimal overhead of the additional diffusion model allows HART to retain the speed advantage of the autoregressive model while significantly enhancing its ability to generate intricate image details.

“The diffusion model has an easier job to do, which leads to more efficiency,” he adds.

Outperforming larger models

During the development of HART, the researchers encountered challenges in effectively integrating the diffusion model to enhance the autoregressive model. They found that incorporating the diffusion model in the early stages of the autoregressive process resulted in an accumulation of errors. Instead, their final design of applying the diffusion model to predict only residual tokens as the final step significantly improved generation quality.

Their method, which uses a combination of an autoregressive transformer model with 700 million parameters and a lightweight diffusion model with 37 million parameters, can generate images of the same quality as those created by a diffusion model with 2 billion parameters, but it does so about nine times faster. It uses about 31 percent less computation than state-of-the-art models.

Moreover, because HART uses an autoregressive model to do the bulk of the work — the same type of model that powers LLMs — it is more compatible for integration with the new class of unified vision-language generative models. In the future, one could interact with a unified vision-language generative model, perhaps by asking it to show the intermediate steps required to assemble a piece of furniture.

“LLMs are a good interface for all sorts of models, like multimodal models and models that can reason. This is a way to push the intelligence to a new frontier. An efficient image-generation model would unlock a lot of possibilities,” he says.

In the future, the researchers want to go down this path and build vision-language models on top of the HART architecture. Since HART is scalable and generalizable to multiple modalities, they also want to apply it for video generation and audio prediction tasks.

This research was funded, in part, by the MIT-IBM Watson AI Lab, the MIT and Amazon Science Hub, the MIT AI Hardware Program, and the U.S. National Science Foundation. The GPU infrastructure for training this model was donated by NVIDIA. 

3D printing approach strings together dynamic objects for you

It’s difficult to build devices that replicate the fluid, precise motion of humans, but that might change if we could pull a few (literal) strings.

At least, that’s the idea behind “cable-driven” mechanisms in which running a string through an object generates streamlined movement across an object’s different parts. Take a robotic finger, for example: You could embed a cable through the palm to the fingertip of this object and then pull it to create a curling motion.

While cable-driven mechanisms can create real-time motion to make an object bend, twist, or fold, they can be complicated and time-consuming to assemble by hand. To automate the process, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed an all-in-one 3D printing approach called “Xstrings.” Part design tool, part fabrication method, Xstrings can embed all the pieces together and produce a cable-driven device, saving time when assembling bionic robots, creating art installations, or working on dynamic fashion designs.

3D printing approach strings together cable-driven mechanisms for you. Video: MIT CSAIL

In a paper to be presented at the 2025 Conference on Human Factors in Computing Systems (CHI2025), the researchers used Xstrings to print a range of colorful and unique objects that included a red walking lizard robot, a purple wall sculpture that can open and close like a peacock’s tail, a white tentacle that curls around items, and a white claw that can ball up into a fist to grab objects.

To fabricate these eye-catching mechanisms, Xstrings allows users to fully customize their designs in a software program, sending them to a multi-material 3D printer to bring that creation to life. You can automatically print all the device’s parts in their desired locations in one step, including the cables running through it and the joints that enable its intended motion.

MIT CSAIL postdoc and lead author Jiaji Li says that Xstrings can save engineers time and energy, reducing 40 percent of total production time compared to doing things manually. “Our innovative method can help anyone design and fabricate cable-driven products with a desktop bi-material 3D printer,” says Li.

A new twist on cable-driven fabrication

To use the Xstrings program, users first input a design with specific dimensions, like a rectangular cube divided into smaller pieces with a hole in the middle of each one. You can then choose which way its parts move by selecting different “primitives:” bending, coiling (like a spring), twisting (like a screw), or compressing — and the angle of these motions.

For even more elaborate creations, users can incorporate multiple primitives to create intriguing combinations of motions. If you wanted to make a toy snake, you could include several twists to create a “series” combo, in which a single cord drives a sequence of motions. To create the robot claw, the team embedded multiple cables into a “parallel” combination, where several strings are embedded, to enable each finger to close up into a fist.

Xstrings facilitates how cables are integrated into the object it’s producing. Users can choose exactly how the strings are secured, including its endpoint, the holes within the structure that the cord passes through, and where you’d pull to operate the device. Image courtesy of the researchers.

Beyond fine-tuning the way cable-driven mechanisms move, Xstrings also facilitates how cables are integrated into the object. Users can choose exactly how the strings are secured, in terms of where the “anchor” (endpoint), “threaded areas” (or holes within the structure that the cord passes through), and “exposed point” (where you’d pull to operate the device) are located. With a robot finger, for instance, you could choose the anchor to be located at the fingertip, with a cable running through the finger and a pull tag exposed at the other end.

Xstrings also supports diverse joint designs by automatically placing components that are elastic, compliant, or mechanical. This allows the cable to turn as needed as it completes the device’s intended motion.

Driving unique designs across robotics, art, and beyond

Once users have simulated their digital blueprint for a cable-driven item, they can bring it to life via fabrication. Xstrings can send your design to a fused deposition modeling 3D printer, where plastic is melted down into a nozzle before the filaments are poured out to build structures up layer by layer.

Xstrings uses this technique to lay out cables horizontally and build around them. To ensure their method would successfully print cable-driven mechanisms, the researchers carefully tested their materials and printing conditions.

For example, the researchers found that their strings only broke after being pulled up and down by a mechanical device more than 60,000 times. In another test, the team discovered that printing at 260 degrees Celsius with a speed of 10-20 millimeters per second was ideal for producing their many creative items.

“The Xstrings software can bring a variety of ideas to life,” says Li. “It enables you to produce a bionic robot device like a human hand, mimicking our own gripping capabilities. You can also create interactive art pieces, like a cable-driven sculpture with unique geometries, and clothes with adjustable flaps. One day, this technology could enable the rapid, one-step creation of cable-driven robots in outer space, even within highly confined environments such as space stations or extraterrestrial bases.”

The team’s approach offers plenty of flexibility and a noticeable speed boost to fabricating cable-driven objects. It creates objects that are rigid on the outside, but soft and flexible on the inside; in the future, they may look to develop objects that are soft externally but rigid internally, much like humans’ skin and bones. They’re also considering using more resilient cables, and, instead of just printing strings horizontally, embedding ones that are angled or even vertical.

Li wrote the paper with Zhejiang University master’s student Shuyue Feng; Tsinghua University master’s student Yujia Liu; Zhejiang University assistant professor and former MIT Media Lab visiting researcher Guanyun Wang; and three CSAIL members: Maxine Perroni-Scharf, an MIT PhD student in electrical engineering and computer science; Emily Guan, a visiting researcher; and senior author Stefanie Mueller, the TIBCO Career Development Associate Professor in the MIT departments of Electrical Engineering and Computer Science and Mechanical Engineering, and leader of the HCI Engineering Group.

This research was supported, in part, by a postdoctoral research fellowship from Zhejiang University, and the MIT-GIST Program.