Interactive mouthpiece advances opportunities for health data, assistive technology, and hands-free interactions

When you think about hands-free devices, you might picture Alexa and other voice-activated in-home assistants, Bluetooth earpieces, or asking Siri to make a phone call in your car. You might not imagine using your mouth to communicate with other devices like a computer or a phone remotely. 

Thinking outside the box, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Aarhus University researchers have now engineered “MouthIO,” a dental brace that can be fabricated with sensors and feedback components to capture in-mouth interactions and data. This interactive wearable could eventually assist dentists and other doctors with collecting health data and help motor-impaired individuals interact with a phone, computer, or fitness tracker using their mouths.

Resembling an electronic retainer, MouthIO is a see-through brace that fits the specifications of your upper or lower set of teeth from a scan. The researchers created a plugin for the modeling software Blender to help users tailor the device to fit a dental scan, where you can then 3D print your design in dental resin. This computer-aided design tool allows users to digitally customize a panel (called PCB housing) on the side to integrate electronic components like batteries, sensors (including detectors for temperature and acceleration, as well as tongue-touch sensors), and actuators (like vibration motors and LEDs for feedback). You can also place small electronics outside of the PCB housing on individual teeth.

Research by others at MIT has also led to another mouth-based touchpad, based on technology initially developed in the Media Lab. That device is available via Augmental, a startup deploying technology that lets people with movement impairments seamlessly interact with their personal computational devices.

The active mouth

“The mouth is a really interesting place for an interactive wearable,” says senior author Michael Wessely, a former CSAIL postdoc and senior author on a paper about MouthIO who is now an assistant professor at Aarhus University. “This compact, humid environment has elaborate geometries, making it hard to build a wearable interface to place inside. With MouthIO, though, we’ve developed an open-source device that’s comfortable, safe, and almost invisible to others. Dentists and other doctors are eager about MouthIO for its potential to provide new health insights, tracking things like teeth grinding and potentially bacteria in your saliva.”

The excitement for MouthIO’s potential in health monitoring stems from initial experiments. The team found that their device could track bruxism (the habit of grinding teeth) by embedding an accelerometer within the brace to track jaw movements. When attached to the lower set of teeth, MouthIO detected when users grind and bite, with the data charted to show how often users did each.

Wessely and his colleagues’ customizable brace could one day help users with motor impairments, too. The team connected small touchpads to MouthIO, helping detect when a user’s tongue taps their teeth. These interactions could be sent via Bluetooth to scroll across a webpage, for example, allowing the tongue to act as a “third hand” to help enable hands-free interaction.

“MouthIO is a great example how miniature electronics now allow us to integrate sensing into a broad range of everyday interactions,” says study co-author Stefanie Mueller, the TIBCO Career Development Associate Professor in the MIT departments of Electrical Engineering and Computer Science and Mechanical Engineering and leader of the HCI Engineering Group at CSAIL. “I’m especially excited about the potential to help improve accessibility and track potential health issues among users.”

Molding and making MouthIO

To get a 3D model of your teeth, you can first create a physical impression and fill it with plaster. You can then scan your mold with a mobile app like Polycam and upload that to Blender. Using the researchers’ plugin within this program, you can clean up your dental scan to outline a precise brace design. Finally, you 3D print your digital creation in clear dental resin, where the electronic components can then be soldered on. Users can create a standard brace that covers their teeth, or opt for an “open-bite” design within their Blender plugin. The latter fits more like open-finger gloves, exposing the tips of your teeth, which helps users avoid lisping and talk naturally.

This “do it yourself” method costs roughly $15 to produce and takes two hours to be 3D-printed. MouthIO can also be fabricated with a more expensive, professional-level teeth scanner similar to what dentists and orthodontists use, which is faster and less labor-intensive.

Compared to its closed counterpart, which fully covers your teeth, the researchers view the open-bite design as a more comfortable option. The team preferred to use it for beverage monitoring experiments, where they fabricated a brace capable of alerting users when a drink was too hot. This iteration of MouthIO had a temperature sensor and a monitor embedded within the PCB housing that vibrated when a drink exceeded 65 degrees Celsius (or 149 degrees Fahrenheit). This could help individuals with mouth numbness better understand what they’re consuming.

In a user study, participants also preferred the open-bite version of MouthIO. “We found that our device could be suitable for everyday use in the future,” says study lead author and Aarhus University PhD student Yijing Jiang. “Since the tongue can touch the front teeth in our open-bite design, users don’t have a lisp. This made users feel more comfortable wearing the device during extended periods with breaks, similar to how people use retainers.”

The team’s initial findings indicate that MouthIO is a cost-effective, accessible, and customizable interface, and the team is working on a more long-term study to evaluate its viability further. They’re looking to improve its design, including experimenting with more flexible materials, and placing it in other parts of the mouth, like the cheek and the palate. Among these ideas, the researchers have already prototyped two new designs for MouthIO: a single-sided brace for even higher comfort when wearing MouthIO while also being fully invisible to others, and another fully capable of wireless charging and communication.

Jiang, Mueller, and Wessely’s co-authors include PhD student Julia Kleinau, master’s student Till Max Eckroth, and associate professor Eve Hoggan, all of Aarhus University. Their work was supported by a Novo Nordisk Foundation grant and was presented at ACM’s Symposium on User Interface Software and Technology.

Twenty Years of New Women in EECS

Every Friday morning, the sound of laughter and the smell of coffee greet the tenacious few who’ve risen early, bundled up, and braved the Boston cold to appear in Building 34 well before class. They’re here for the New Women in EECS Seminar, a weekly departmental tradition in which women PhD candidates gather over breakfast food to chat, socialize, and learn more about the skills and tools they’ll need in graduate school. 

The seminar began in 2005, and its success has inspired a spin-off: the Networking T seminar, offered weekly for any interested PhD student within the department to attend.

The seminar is now entering its twentieth year–but, surprisingly, it owes its entire existence to one visionary member of the faculty, Leslie Kolodziejski, who is the Joseph F. and Nancy P. Keithley Professor in EE. In 2005, when Kolodziejski started the group, women were a small percentage of the faculty and student body of EECS, a percentage mirrored in the overall world of STEM at the time. Women composed only 24%  of the undergraduate population; 19% of the graduate student population; and 13% of the faculty in the department. Many felt alone and isolated, and struggled to find mentors who could understand their experience. Still more dealt with imposter syndrome, wondering if they truly had what it takes to succeed in the highly competitive MIT environment. The seminar was designed to change the students’ experience and narrative to one of mutual support and caring.

“I really appreciated the sense of community that Leslie established for us first-year students during that seminar,” says Rachel Owens, currently in the third year of her PhD. “It was actually my undergrad advisor Elizabeth Basha, who was an MIT alumna (and the first female science or engineering professor I ever had), who recommended the seminar to me. I understand her fond memories of the breakfasts better now, even though it’s still hard to put into words why they’re important.” 

Janet Fischer, a member of the Graduate Office who joined the seminars in 2011, shared the sense of wonder at the effects of the weekly gathering. “Being part of the seminar on autumn Friday mornings felt like sacred space, and I enjoyed every year that I was involved in it.”

Janet Fischer (wearing coral, standing in the center of the picture) joined Leslie Kolodziejski (wearing green, standing on the left) as a co-host of the breakfasts in 2011. Fischer remembers, “From the first meeting each September, through the last in December, you could feel the trust build in the group, and watch the friendships grow amongst the attendees. It was a beautiful thing!”

Linlu Qiu, a third-year PhD who also attended the 2022 seminar with Rachel Owens, discovered many similarities with her fellow seminar attendees over the course of the year: “This is a safe space where people feel comfortable sharing their feelings. I learned that many of my fellow students and even the invited guests have had similar experiences to mine.” Those common experiences transcended research area–Qiu noted that many of the women she got to know during the seminar did not share the same major/interests as her (she is currently studying natural language processing), and rarely encountered each other in class, but still formed tight bonds. “[These commonalities] made me feel less alone, giving me the courage to pursue whatever I want to accomplish.”

By design, the seminar features regular visits from speakers across campus who can share resources and advice, but whose stories also offer models for out-of-the-box career choices. Over the years, the seminar has hosted visitors ranging from an MIT campus police sergeant, to the Head of MIT Health, to representatives of the MIT Libraries, to the heads of all three faculties within EECS, to a 50-year employee who experienced the tumultuous campus times in the 1970’s. Recently, Associate Director of the Teaching + Learning Lab Lourdes Alemán dropped by to share resources; TA Hope Dargan SB ’21 MEng ’23, a second-year PhD student, credits that particular visit with giving her the impetus to reach out and make contact with the Lab, which “improved my teaching and helped me find more teaching oriented people at MIT.” 

Seminar alum Margherita Firenze, now in the second year of her PhD, remembers, “One of my favorite sessions was the “Importance of Networking and Mentoring” session. Through funny cartoons and personal stories, Leslie explained the differences between networking and mentoring and gave us tips to identify a possible mentor. She encouraged us to reach out to people and know that both the mentor and mentee get something out of the relationship.  Her advice has helped me reach out to possible mentors and seek networking events at conferences.”

The success of the New Women in EECS seminar has been so marked that it has inspired a spin-off: the Networking T seminar. As the name implies, the T seminar features afternoon tea (and other goodies), and is also offered weekly for any interested PhD student within the department to attend. “I have also offered this seminar every year, even in the pandemic, and we are in the 13th year,” explains Kolodziejski, who has scheduled both seminars in parallel to make sure every EECS first-year graduate student can reap the benefits of small group conversation and camaraderie. 

Another tradition to emerge from the seminar is the annual Erin M. Aylward Memorial Dinner, now in its twelfth year, which honors a late graduate student beloved by her peers. “I attended the dinner, and we filled the entire restaurant,” remembers Lizzy Ann Salata, who will be graduating with both her SM and MBA degrees in spring of this year. “I have never been in a STEM environment where the amount of women in the organization could fill an entire restaurant. It was an unreal experience and just goes to show how much dedication and intentionality MIT puts into diversifying their student body.” 

At the 2024 Erin M. Aylward Memorial Community Dinner, Leslie Kolodziejski awarded the traditional thank-you gift (a “gurgle pot”) to the leaders of the GW6 student organization. Photo credit: Veera Panova.

The seminar’s effects, after two decades, have been profound. Kolodziejski estimates that over 400 women have participated in the Friday seminar cumulatively, finding connection and companionship where they might otherwise have felt isolated–with many additional students dropping in for the Networking T seminars. “Leslie managed to foster a wonderful support network for female graduate students to share advice with through the seminar series,” says Michelle Sander, Associate Professor of Electrical and Computer Engineering at Boston University. “This built the foundation for friendships that lasted beyond the first-year seminar and provided support throughout the PhD study duration.”

The effects have not been confined solely to EECS, either, as Lizzie Ann Salata can testify. The frequent seminar attendee is currently in the Leaders for Global Operations Master’s Program at MIT, a dual degree held between the Sloan School of Business and the School of Engineering. “It is great to be at the intersection of business and technology, however, sometimes students feel as though they don’t truly belong at either school,” she says. “Attending these seminars helped me get over this imposter syndrome. The women EECS students were always so welcoming. In fact, during the last seminar, there was a ‘quiz’ to see how much we had learned about each other. I was included as one of the quiz questions, which made me feel so special!” 

Echoes of the seminar are also evident in the Department’s Thriving Stars initiative, a holistic effort aiming to improve gender representation at the graduate level within EECS. Combining community-building events like the seminar, annual Aylward dinner, and sunset cruise with mentorship / buddy programs, career panels and fireside chats with notable leaders in EECS, the program, developed by Kolodziesjki and Department Head Asu Ozdaglar, is now closing out its fourth year. Within those four years, graduate program applications from women have increased by 30%. Additionally, the percentage of women in the graduate program has grown sharply from when Kolodziejski founded the weekly seminar–from 134 women in 2005, to 252 (or 30% of the PhD student body) today.

For Kolodziejski, seeing the department’s transformation into a more welcoming community for women has been a profoundly rewarding experience. “It is so special to watch these women thrive and go on to do amazing things,” she says. “I remember them all so fondly. We really bonded!” Hope Dargan agrees, adding, “10/10 would recommend, and hope the seminar continues for 20+ more years!”

Department of EECS Announces Promotions

The Department of EECS is proud to announce the following promotions to Associate Professor with tenure, all effective July 1, 2025: 

Connor Coley is being promoted to Associate Professor with tenure in the Department of Chemical Engineering and the Department of Electrical Engineering and Computer Science. Coley received his B.S. and Ph.D. in Chemical Engineering from Caltech and MIT, respectively, and did his postdoctoral training at the Broad Institute. His research group at MIT develops computational strategies for small molecule drug discovery, chemical synthesis, and structure elucidation. Key research areas in the group include the design of new neural models for representation learning on molecules, data-driven synthesis planning, in silico strategies for predicting the outcomes of organic reactions, model-guided Bayesian optimization, de novo molecular generation, and computational metabolomics.

Among other honors, Coley has received the AI2050 Early Career Fellowship; was recognized as a Samsung AI Researcher of the Year; is a recipient of C&EN’s “Talented Twelve” award; was named to Forbes Magazine’s “30 Under 30” for Healthcare and MIT Technology Review’s Innovators Under 35; has received the NSF CAREER award and the Bayer Early Excellence in Science Award; and was most recently named a Camille Dreyfus Teacher-Scholar. Additionally, Coley has distinguished himself as a committed undergraduate mentor, receiving the 2024 Outstanding UROP Mentor Award, and as a thoughtful curriculum developer, creating 3.C01[J] “Machine Learning for Molecular Engineering” alongside Rafael Gomez-Bombarelli (Materials Science and Engineering) and Ernest Fraenkel (Biological Engineering), a course for which all three were recognized with the Schwarzman College of Computing’s 2023 Common Ground Award for Excellence in Teaching.

Mohsen Ghaffari is being promoted to Associate Professor with tenure. Ghaffari received his BSc from the Sharif University of Technology in 2010, and his MSc and PhD in EECS from MIT in 2013 and 2016, respectively, before joining the faculty of ETH Zurich. He joined MIT EECS in July of 2022. His research explores the theory of distributed and parallel computation, and he has had influential work on a range of algorithmic problems, including generic derandomization methods for distributed computing and parallel computing (which resolved several decades-old open problems), improved distributed algorithms for graph problems, sublinear algorithms derived via distributed techniques, and algorithmic and impossibility results for massively parallel computation.

Ghaffari’s work has received several best paper awards, including from the IEEE Symposium on Foundations of Computer Science (FOCS) 2024, the ACM Symposium on Parallel Algorithms and Architectures (SPAA) 2023, the ACM Symposium on Discrete Algorithms (SODA) 2016, and the International Symposium on DIStributed Computing (DISC) 2017 and 2013. While at ETH, he received a prestigious European Research Council Starting Grant and the Google Faculty Research Award. In 2025, he was named a Sloan Research Fellow. During his time at ETH, he developed graduate courses in Advanced Algorithms, Distributed Algorithms, and Massively Parallel Algorithms, plus modules on parallel and distributed graph algorithms for undergraduate courses. At MIT, Ghaffari has done a major revision of the graduate-level course 6.5250, Distributed Algorithms, and also lectures in the undergraduate Introduction to Algorithms class.

Song Han is being promoted to Associate Professor with tenure. He earned his PhD from Stanford, pioneering efficient AI computing techniques such as “Deep Compression” (pruning, quantization) and the “Efficient Inference Engine,” which first introduced weight sparsity to modern AI chips, making it one of the top-5 most cited papers in the 50-year history of ISCA (1953-2023). His innovations, including TinyML and hardware-aware neural architecture search (Once-for-All Network), have advanced AI model deployment on resource-constrained devices. His recent work on LLM quantization/acceleration (SmoothQuant, AWQ, StreamingLLM) has improved efficiency in LLM inference, and was adopted by NVIDIA TensorRT-LLM. 

Han has received best paper awards at ICLR ’16, FPGA ’17, and MLSys ’24, the NSF CAREER Award in 2020, MIT Technology Review’s “35 Innovators Under 35,” IEEE “AI’s 10 to Watch,” and the 2023 Sloan Research Fellowship. He developed the open lecture series EfficientML.ai to share advances in efficient ML research. Within the department, Han has developed a class on tiny machine learning, 6.5940 TinyML and Efficient Deep Learning Computing; has taught and developed original content for 6.191 Computation Structures (the department’s foundational computer architecture course); and has chaired the graduate admissions subcommittee for “machine learning for systems and systems for machine learning”. 


Kaiming He is being promoted to Associate Professor with tenure. He earned his BS from Tsinghua University in 2007 and his PhD from the Chinese University of Hong Kong in 2011 before joining Microsoft Research Asia (MSRA) as a Researcher and then Facebook AI Research (FAIR) as a Research Scientist. He joined the Department of EECS as an associate professor in February 2024, and is affiliated with the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research areas include deep learning and computer vision. He is best-known for his work on Deep Residual Networks (ResNets), which have made significant impact on computer vision and broader artificial intelligence; on visual object detection and segmentation, including Faster R-CNN and Mask R-CNN; and on visual self-supervised learning.

He’s awards include the PAMI Young Researcher Award in 2018; three best paper awards, at CVPR 2009, CVPR 2016, and ICCV 2017; two best paper honorable mentions (at ECCV 2018 and CVPR 2021); and, alongside the team behind Detectron, an Everingham Prize for selfless contributions to computer vision. He has taught 6.8300 Advances in Computer Vision, as well as a specialized seminar on Deep Generative Models, and is currently serving as a member of the faculty search committee in AI+D. 


Phillip Isola is being promoted to Associate Professor with tenure. Isola received his Ph.D. in 2015 from the Brain and Cognitive Sciences (BCS) Department at MIT before taking on a postdoctoral position at Berkeley, followed by a visiting research scientist position at Open AI. He joined MIT in July 2018. Isola’s research explores learning representations that capture the commonalities between disparate domains, and thereby achieve generality; directly linking experiences via visual translation; and designing representations that can adapt fast. A leader in the use of machine learning to analyze and create images, Isola’s series of 2017 papers introduced a solution to the problem of image translation. His most recent work addresses another fundamental computer vision problem: the requirement of large amounts of labelled, or supervised, training data, which limits most learning-based approaches to computer vision. Among other awards, Isola has won the Packard Fellowship Award (2021), the IEEE PAMI Young Researcher Award (2021), the Sloan Fellowship, Samsung AI Researcher of the Year, and the CoRL 2023 Best Paper Award. 

Within the Department, Isola is a member of the 6-4 curriculum committee; has co-designed a new course 6.882 Embodied Intelligence; has updated and redesigned lectures for 6.819 Advances in Computer Vision; and has taught 6.036 Machine Learning. Alongside Stefanie Jegelka, he designed a deep-learning class pilot which has now become a graduate-level course. Additionally, Isola co-authored the textbook Foundations of Computer Vision alongside Antonio Torralba and William Freeman; and has served on multiple faculty search, steering, and admissions committees.


Jonathan Ragan-Kelley is being promoted to Associate Professor with tenure. He obtained his PhD from MIT in 2014, and after spending time as a postdoctoral researcher at Stanford (2014-2016), a visiting scientist at Google (2016-2017), and an Assistant Professor at Berkeley (2017-2019), Ragan-Kelley joined MIT in January of 2020. A pioneer in the development of high-performance domain-specific languages (especially for computer graphics), Ragan-Kelley has repeatedly identified important domains that require significant expert effort to deliver the necessary performance before developing new high-level programming languages that capture this expertise and can deliver high performance with less effort and lower risk of bugs. His computer graphics language Halide has become the industry standard for image processing, and is used in both Google phones and Adobe Photoshop, while his exocompiler, Exo, is used by developers at Apple and Intel, and powers core features on the iPhone. His earlier work, on the system Lightspeed, was used to produce movies for several years at Industrial Light and Magic and was a finalist for a technical Oscar award. 

Ragan-Kelley’s work has earned him (in 2021) the ACM SIGGRAPH Significant New Researcher Award, which is the highest award given by the community to young researchers. His work has also been featured in CACM research highlights in 2018 and 2019. His original Halide publication from 2013, received the PLDI Test of Time Award in 2023; in the same year, he was named a Sloan Research Fellow. While at Berkeley, he developed a graduate class on compilers; while at MIT, Ragan-Kelley has taught 6.172 Software Performance Engineering multiple times; lectured in the fundamentals of programming class; and has headed the CSAIL Visual Computing Community of Research. In 2023, his contributions were acknowledged by the Department with the EECS Outstanding Educator Award.


Arvind Satyanarayan is being promoted to Associate Professor with tenure. Satyanarayan earned his MS and PhD in Computer Science at Stanford in 2014 and 2017, respectively, before spending a year as a postdoctoral research scientist on the Google Brain team. Satyanarayan joined the Department of EECS in July 2018. Within MIT Computer Science & Artificial Intelligence Laboratory (CSAIL), he leads the Visualization Group, which focuses on visualization to study intelligence augmentation, specifically tools for interactive visualization, sociotechnical impacts of visualization, and machine learning interpretability. His PhD work on Reactive Vega and Vega-Lite has been widely adopted in data science (e.g., via the Vega-Altair Python package), in industry (e.g., at Apple, Google, and The LA Times), and in academic research. Among other awards, he has received an NSF CAREER award; a Sloan Research Fellowship; a National Academy of Science Kavli Fellowship; the IEEE VGTC Visualization Significant New Researcher Award, and paper awards at ACM CHI, IUI, IEEE VIS, EuroVis, and ACL.

Within the Department of EECS, Satyanarayan has developed a new course on interactive data visualization & society (6.C35/C85) as part of the College of Computing’s Common Ground subjects. He has repeatedly served in the program committees of several major conferences in his area, including ACM Conference on Human Factors in Computing Systems (CHI), the ACM Symposium on User Interface Software and Technology (UIST), and the IEEE Visualization Conference (VIS), and served as diversity and inclusion chair for IEEE VIS and in the IEEE Ad Hoc Committee on Diversity and Inclusion. His excellence in teaching has been recognized by the department with the 2020 Kolokotrones Education Award and the 2021 Seth J. Teller Award for Excellence, Inclusion, and Diversity.

The sweet taste of a new idea

Behavioral economist Sendhil Mullainathan has never forgotten the pleasure he felt the first time he tasted a delicious crisp, yet gooey Levain cookie. He compares the experience to when he encounters new ideas.

“That hedonic pleasure is pretty much the same pleasure I get hearing a new idea, discovering a new way of looking at a situation, or thinking about something, getting stuck and then having a breakthrough. You get this kind of core basic reward,” says Mullainathan, the Peter de Florez Professor with dual appointments in the MIT departments of Economics and Electrical Engineering and Computer Science, and a principal investigator at the MIT Laboratory for Information and Decision Systems (LIDS).

Mullainathan’s love of new ideas, and by extension of going beyond the usual interpretation of a situation or problem by looking at it from many different angles, seems to have started very early. As a child in school, he says, the multiple-choice answers on tests all seemed to offer possibilities for being correct.

“They would say, ‘Here are three things. Which of these choices is the fourth?’ Well, I was like, ‘I don’t know.’ There are good explanations for all of them,” Mullainathan says. “While there’s a simple explanation that most people would pick, natively, I just saw things quite differently.”

Mullainathan says the way his mind works, and has always worked, is “out of phase” — that is, not in sync with how most people would readily pick the one correct answer on a test. He compares the way he thinks to “one of those videos where an army’s marching and one guy’s not in step, and everyone is thinking, what’s wrong with this guy?”

Luckily, Mullainathan says, “being out of phase is kind of helpful in research.”

And apparently so. Mullainathan has received a MacArthur “Genius Grant,” has been designated a “Young Global Leader” by the World Economic Forum, was named a “Top 100 thinker” by Foreign Policy magazine, was included in the “Smart List: 50 people who will change the world” by Wired magazine, and won the Infosys Prize, the largest monetary award in India recognizing excellence in science and research.

Another key aspect of who Mullainathan is as a researcher — his focus on financial scarcity — also dates back to his childhood. When he was about 10, just a few years after his family moved to the Los Angeles area from India, his father lost his job as an aerospace engineer because of a change in security clearance laws regarding immigrants. When his mother told him that without work, the family would have no money, he says he was incredulous.

“At first I thought, that can’t be right. It didn’t quite process,” he says. “So that was the first time I thought, there’s no floor. Anything can happen. It was the first time I really appreciated economic precarity.”

His family got by running a video store and then other small businesses, and Mullainathan made it to Cornell University, where he studied computer science, economics, and mathematics. Although he was doing a lot of math, he found himself drawn not to standard economics, but to the behavioral economics of an early pioneer in the field, Richard Thaler, who later won the Nobel Memorial Prize in Economic Sciences for his work. Behavioral economics brings the psychological, and often irrational, aspects of human behavior into the study of economic decision-making.

“It’s the non-math part of this field that’s fascinating,” says Mullainathan. “What makes it intriguing is that the math in economics isn’t working. The math is elegant, the theorems. But it’s not working because people are weird and complicated and interesting.”

Behavioral economics was so new as Mullainathan was graduating that he says Thaler advised him to study standard economics in graduate school and make a name for himself before concentrating on behavioral economics, “because it was so marginalized. It was considered super risky because it didn’t even fit a field,” Mullainathan says.

Unable to resist thinking about humanity’s quirks and complications, however, Mullainathan focused on behavioral economics, got his PhD at Harvard University, and says he then spent about 10 years studying people.

“I wanted to get the intuition that a good academic psychologist has about people. I was committed to understanding people,” he says.

As Mullainathan was formulating theories about why people make certain economic choices, he wanted to test these theories empirically.

In 2013, he published a paper in Science titled “Poverty Impedes Cognitive Function.” The research measured sugarcane farmers’ performance on intelligence tests in the days before their yearly harvest, when they were out of money, sometimes nearly to the point of starvation. In the controlled study, the same farmers took tests after their harvest was in and they had been paid for a successful crop — and they scored significantly higher.

Mullainathan says he is gratified that the research had far-reaching impact, and that those who make policy often take its premise into account.

“Policies as a whole are kind of hard to change,” he says, “but I do think it has created sensitivity at every level of the design process, that people realize that, for example, if I make a program for people living in economic precarity hard to sign up for, that’s really going to be a massive tax.”

To Mullainathan, the most important effect of the research was on individuals, an impact he saw in reader comments that appeared after the research was covered in The Guardian.

“Ninety percent of the people who wrote those comments said things like, ‘I was economically insecure at one point. This perfectly reflects what it felt like to be poor.’”

Such insights into the way outside influences affect personal lives could be among important advances made possible by algorithms, Mullainathan says.

“I think in the past era of science, science was done in big labs, and it was actioned into big things. I think the next age of science will be just as much about allowing individuals to rethink who they are and what their lives are like.”

Last year, Mullainathan came back to MIT (after having previously taught at MIT from 1998 to 2004) to focus on artificial intelligence and machine learning.

“I wanted to be in a place where I could have one foot in computer science and one foot in a top-notch behavioral economic department,” he says. “And really, if you just objectively said ‘what are the places that are A-plus in both,’ MIT is at the top of that list.”

While AI can automate tasks and systems, such automation of abilities humans already possess is “hard to get excited about,” he says. Computer science can be used to expand human abilities, a notion only limited by our creativity in asking questions.

“We should be asking, what capacity do you want expanded? How could we build an algorithm to help you expand that capacity? Computer science as a discipline has always been so fantastic at taking hard problems and building solutions,” he says. “If you have a capacity that you’d like to expand, that seems like a very hard computing challenge. Let’s figure out how to take that on.”

The sciences that “are very far from having hit the frontier that physics has hit,” like psychology and economics, could be on the verge of huge developments, Mullainathan says. “I fundamentally believe that the next generation of breakthroughs is going to come from the intersection of understanding of people and understanding of algorithms.”

He explains a possible use of AI in which a decision-maker, for example a judge or doctor, could have access to what their average decision would be related to a particular set of circumstances. Such an average would be potentially freer of day-to-day influences — such as a bad mood, indigestion, slow traffic on the way to work, or a fight with a spouse.

Mullainathan sums the idea up as “average-you is better than you. Imagine an algorithm that made it easy to see what you would normally do. And that’s not what you’re doing in the moment. You may have a good reason to be doing something different, but asking that question is immensely helpful.”

Going forward, Mullainathan will absolutely be trying to work toward such new ideas — because to him, they offer such a delicious reward.

With AI, researchers predict the location of virtually any protein within a human cell

A protein located in the wrong part of a cell can contribute to several diseases, such as Alzheimer’s, cystic fibrosis, and cancer. But there are about 70,000 different proteins and protein variants in a single human cell, and since scientists can typically only test for a handful in one experiment, it is extremely costly and time-consuming to identify proteins’ locations manually.

A new generation of computational techniques seeks to streamline the process using machine-learning models that often leverage datasets containing thousands of proteins and their locations, measured across multiple cell lines. One of the largest such datasets is the Human Protein Atlas, which catalogs the subcellular behavior of over 13,000 proteins in more than 40 cell lines. But as enormous as it is, the Human Protein Atlas has only explored about 0.25 percent of all possible pairings of all proteins and cell lines within the database.

Now, researchers from MIT, Harvard University, and the Broad Institute of MIT and Harvard have developed a new computational approach that can efficiently explore the remaining uncharted space. Their method can predict the location of any protein in any human cell line, even when both protein and cell have never been tested before.

Their technique goes one step further than many AI-based methods by localizing a protein at the single-cell level, rather than as an averaged estimate across all the cells of a specific type. This single-cell localization could pinpoint a protein’s location in a specific cancer cell after treatment, for instance.

The researchers combined a protein language model with a special type of computer vision model to capture rich details about a protein and cell. In the end, the user receives an image of a cell with a highlighted portion indicating the model’s prediction of where the protein is located. Since a protein’s localization is indicative of its functional status, this technique could help researchers and clinicians more efficiently diagnose diseases or identify drug targets, while also enabling biologists to better understand how complex biological processes are related to protein localization.

“You could do these protein-localization experiments on a computer without having to touch any lab bench, hopefully saving yourself months of effort. While you would still need to verify the prediction, this technique could act like an initial screening of what to test for experimentally,” says Yitong Tseo, a graduate student in MIT’s Computational and Systems Biology program and co-lead author of a paper on this research.

Tseo is joined on the paper by co-lead author Xinyi Zhang, a graduate student in the Department of Electrical Engineering and Computer Science (EECS) and the Eric and Wendy Schmidt Center at the Broad Institute; Yunhao Bai of the Broad Institute; and senior authors Fei Chen, an assistant professor at Harvard and a member of the Broad Institute, and Caroline Uhler, the Andrew and Erna Viterbi Professor of Engineering in EECS and the MIT Institute for Data, Systems, and Society (IDSS), who is also director of the Eric and Wendy Schmidt Center and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS). The research appears today in Nature Methods.

Collaborating models

Many existing protein prediction models can only make predictions based on the protein and cell data on which they were trained or are unable to pinpoint a protein’s location within a single cell.

To overcome these limitations, the researchers created a two-part method for prediction of unseen proteins’ subcellular location, called PUPS.

The first part utilizes a protein sequence model to capture the localization-determining properties of a protein and its 3D structure based on the chain of  amino acids that forms it.

The second part incorporates an image inpainting model, which is designed to fill in missing parts of an image. This computer vision model looks at three stained images of a cell to gather information about the state of that cell, such as its type, individual features, and whether it is under stress.

PUPS joins the representations created by each model to predict where the protein is located within a single cell, using an image decoder to output a highlighted image that shows the predicted location.

“Different cells within a cell line exhibit different characteristics, and our model is able to understand that nuance,” Tseo says.

A user inputs the sequence of amino acids that form the protein and three cell stain images — one for the nucleus, one for the microtubules, and one for the endoplasmic reticulum. Then PUPS does the rest.

A deeper understanding

The researchers employed a few tricks during the training process to teach PUPS how to combine information from each model in such a way that it can make an educated guess on the protein’s location, even if it hasn’t seen that protein before.

For instance, they assign the model a secondary task during training: to explicitly name the compartment of localization, like the cell nucleus. This is done alongside the primary inpainting task to help the model learn more effectively.

A good analogy might be a teacher who asks their students to draw all the parts of a flower in addition to writing their names. This extra step was found to help the model improve its general understanding of the possible cell compartments.

In addition, the fact that PUPS is trained on proteins and cell lines at the same time helps it develop a deeper understanding of where in a cell image proteins tend to localize.

PUPS can even understand, on its own, how different parts of a protein’s sequence contribute separately to its overall localization.

“Most other methods usually require you to have a stain of the protein first, so you’ve already seen it in your training data. Our approach is unique in that it can generalize across proteins and cell lines at the same time,” Zhang says.

Because PUPS can generalize to unseen proteins, it can capture changes in localization driven by unique protein mutations that aren’t included in the Human Protein Atlas.

The researchers verified that PUPS could predict the subcellular location of new proteins in unseen cell lines by conducting lab experiments and comparing the results. In addition, when compared to a baseline AI method, PUPS exhibited on average less prediction error across the proteins they tested.

In the future, the researchers want to enhance PUPS so the model can understand protein-protein interactions and make localization predictions for multiple proteins within a cell. In the longer term, they want to enable PUPS to make predictions in terms of living human tissue, rather than cultured cells.

This research is funded by the Eric and Wendy Schmidt Center at the Broad Institute, the National Institutes of Health, the National Science Foundation, the Burroughs Welcome Fund, the Searle Scholars Foundation, the Harvard Stem Cell Institute, the Merkin Institute, the Office of Naval Research, and the Department of Energy.

Study shows vision-language models can’t handle queries with negation words

Imagine a radiologist examining a chest X-ray from a new patient. She notices the patient has swelling in the tissue but does not have an enlarged heart. Looking to speed up diagnosis, she might use a vision-language machine-learning model to search for reports from similar patients.

But if the model mistakenly identifies reports with both conditions, the most likely diagnosis could be quite different: If a patient has tissue swelling and an enlarged heart, the condition is very likely to be cardiac related, but with no enlarged heart there could be several underlying causes.

In a new study, MIT researchers have found that vision-language models are extremely likely to make such a mistake in real-world situations because they don’t understand negation — words like “no” and “doesn’t” that specify what is false or absent. 

“Those negation words can have a very significant impact, and if we are just using these models blindly, we may run into catastrophic consequences,” says Kumail Alhamoud, an MIT graduate student and lead author of this study.

The researchers tested the ability of vision-language models to identify negation in image captions. The models often performed as well as a random guess. Building on those findings, the team created a dataset of images with corresponding captions that include negation words describing missing objects.

They show that retraining a vision-language model with this dataset leads to performance improvements when a model is asked to retrieve images that do not contain certain objects. It also boosts accuracy on multiple choice question answering with negated captions.

But the researchers caution that more work is needed to address the root causes of this problem. They hope their research alerts potential users to a previously unnoticed shortcoming that could have serious implications in high-stakes settings where these models are currently being used, from determining which patients receive certain treatments to identifying product defects in manufacturing plants.

“This is a technical paper, but there are bigger issues to consider. If something as fundamental as negation is broken, we shouldn’t be using large vision/language models in many of the ways we are using them now — without intensive evaluation,” says senior author Marzyeh Ghassemi, an associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems.

Ghassemi and Alhamoud are joined on the paper by Shaden Alshammari, an MIT graduate student; Yonglong Tian of OpenAI; Guohao Li, a former postdoc at Oxford University; Philip H.S. Torr, a professor at Oxford; and Yoon Kim, an assistant professor of EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. The research will be presented at Conference on Computer Vision and Pattern Recognition.

Neglecting negation

Vision-language models (VLM) are trained using huge collections of images and corresponding captions, which they learn to encode as sets of numbers, called vector representations. The models use these vectors to distinguish between different images.

A VLM utilizes two separate encoders, one for text and one for images, and the encoders learn to output similar vectors for an image and its corresponding text caption.

“The captions express what is in the images — they are a positive label. And that is actually the whole problem. No one looks at an image of a dog jumping over a fence and captions it by saying ‘a dog jumping over a fence, with no helicopters,’” Ghassemi says.

Because the image-caption datasets don’t contain examples of negation, VLMs never learn to identify it.

To dig deeper into this problem, the researchers designed two benchmark tasks that test the ability of VLMs to understand negation.

For the first, they used a large language model (LLM) to re-caption images in an existing dataset by asking the LLM to think about related objects not in an image and write them into the caption. Then they tested models by prompting them with negation words to retrieve images that contain certain objects, but not others.

For the second task, they designed multiple choice questions that ask a VLM to select the most appropriate caption from a list of closely related options. These captions differ only by adding a reference to an object that doesn’t appear in the image or negating an object that does appear in the image.

The models often failed at both tasks, with image retrieval performance dropping by nearly 25 percent with negated captions. When it came to answering multiple choice questions, the best models only achieved about 39 percent accuracy, with several models performing at or even below random chance.

One reason for this failure is a shortcut the researchers call affirmation bias — VLMs ignore negation words and focus on objects in the images instead.

“This does not just happen for words like ‘no’ and ‘not.’ Regardless of how you express negation or exclusion, the models will simply ignore it,” Alhamoud says.

This was consistent across every VLM they tested.

“A solvable problem”

Since VLMs aren’t typically trained on image captions with negation, the researchers developed datasets with negation words as a first step toward solving the problem.

Using a dataset with 10 million image-text caption pairs, they prompted an LLM to propose related captions that specify what is excluded from the images, yielding new captions with negation words.

They had to be especially careful that these synthetic captions still read naturally, or it could cause a VLM to fail in the real world when faced with more complex captions written by humans.

They found that finetuning VLMs with their dataset led to performance gains across the board. It improved models’ image retrieval abilities by about 10 percent, while also boosting performance in the multiple-choice question answering task by about 30 percent.

“But our solution is not perfect. We are just recaptioning datasets, a form of data augmentation. We haven’t even touched how these models work, but we hope this is a signal that this is a solvable problem and others can take our solution and improve it,” Alhamoud says.

At the same time, he hopes their work encourages more users to think about the problem they want to use a VLM to solve and design some examples to test it before deployment.

In the future, the researchers could expand upon this work by teaching VLMs to process text and images separately, which may improve their ability to understand negation. In addition, they could develop additional datasets that include image-caption pairs for specific applications, such as health care.

MIT engineers advance toward a fault-tolerant quantum computer

In the future, quantum computers could rapidly simulate new materials or help scientists develop faster machine-learning models, opening the door to many new possibilities.

But these applications will only be possible if quantum computers can perform operations extremely quickly, so scientists can make measurements and perform corrections before compounding error rates reduce their accuracy and reliability.

The efficiency of this measurement process, known as readout, relies on the strength of the coupling between photons, which are particles of light that carry quantum information, and artificial atoms, units of matter that are often used to store information in a quantum computer.

Now, MIT researchers have demonstrated what they believe is the strongest nonlinear light-matter coupling ever achieved in a quantum system. Their experiment is a step toward realizing quantum operations and readout that could be performed in a few nanoseconds.

The researchers used a novel superconducting circuit architecture to show nonlinear light-matter coupling that is about an order of magnitude stronger than prior demonstrations, which could enable a quantum processor to run about 10 times faster.

There is still much work to be done before the architecture could be used in a real quantum computer, but demonstrating the fundamental physics behind the process is a major step in the right direction, says Yufeng “Bright” Ye SM ’20, PhD ’24, lead author of a paper on this research.

“This would really eliminate one of the bottlenecks in quantum computing. Usually, you have to measure the results of your computations in between rounds of error correction. This could accelerate how quickly we can reach the fault-tolerant quantum computing stage and be able to get real-world applications and value out of our quantum computers,” says Ye.

He is joined on the paper by senior author Kevin O’Brien, an associate professor and principal investigator in the Research Laboratory of Electronics (RLE) at MIT who leads the Quantum Coherent Electronics Group in the Department of Electrical Engineering and Computer Science (EECS). Additional MIT co-authors, with affiliations in RLE and/or MIT Lincoln Laboratory, include Jeremy B. Kline, Alec Yen, Gregory Cunningham, Max Tan, Alicia Zang, Michael Gingras, Bethany M. Niedzielski, Hannah Stickler, Kyle Serniak, and Mollie E. Schwartz. The research appears today in Nature Communications.

A new coupler

This physical demonstration builds on years of theoretical research in the O’Brien group.

After Ye joined the lab as a PhD student in 2019, he began developing a specialized photon detector to enhance quantum information processing.

Through that work, he invented a new type of quantum coupler, which is a device that facilitates interactions between qubits. Qubits are the building blocks of a quantum computer. This so-called quarton coupler had so many potential applications in quantum operations and readout that it quickly became a focus of the lab.

This quarton coupler is a special type of superconducting circuit that has the potential to generate extremely strong nonlinear coupling, which is essential for running most quantum algorithms. As the researchers feed more current into the coupler, it creates an even stronger nonlinear interaction. In this sense, nonlinearity means a system behaves in a way that is greater than the sum of its parts, exhibiting more complex properties.

“Most of the useful interactions in quantum computing come from nonlinear coupling of light and matter. If you can get a more versatile range of different types of coupling, and increase the coupling strength, then you can essentially increase the processing speed of the quantum computer,” Ye explains.

For quantum readout, researchers shine microwave light onto a qubit and then, depending on whether that qubit is in state 0 or 1, there is a frequency shift on its associated readout resonator. They measure this shift to determine the qubit’s state.

Nonlinear light-matter coupling between the qubit and resonator enables this measurement process.

The MIT researchers designed an architecture with a quarton coupler connected to two superconducting qubits on a chip. They turn one qubit into a resonator and use the other qubit as an artificial atom which stores quantum information. This information is transferred in the form of microwave light particles called photons.

“The interaction between these superconducting artificial atoms and the microwave light that routes the signal is basically how an entire superconducting quantum computer is built,” Ye explains.

Enabling faster readout

The quarton coupler creates nonlinear light-matter coupling between the qubit and resonator that’s about an order of magnitude stronger than researchers had achieved before. This could enable a quantum system with lightning-fast readout.

“This work is not the end of the story. This is the fundamental physics demonstration, but there is work going on in the group now to realize really fast readout,” O’Brien says.

That would involve adding additional electronic components, such as filters, to produce a readout circuit that could be incorporated into a larger quantum system.

The researchers also demonstrated extremely strong matter-matter coupling, another type of qubit interaction that is important for quantum operations. This is another area they plan to explore with future work.

Fast operations and readout are especially important for quantum computers because qubits have finite lifespans, a concept known as coherence time.

Stronger nonlinear coupling enables a quantum processor to run faster and with lower error, so the qubits can perform more operations in the same amount of time. This means the qubits can run more rounds of error correction during their lifespans.

“The more runs of error correction you can get in, the lower the error will be in the results,” Ye says.

In the long run, this work could help scientists build a fault-tolerant quantum computer, which is essential for practical, large-scale quantum computation.

This research was supported, in part, by the Army Research Office, the AWS Center for Quantum Computing, and the MIT Center for Quantum Engineering.

Making AI models more trustworthy for high-stakes settings

The ambiguity in medical imaging can present major challenges for clinicians who are trying to identify disease. For instance, in a chest X-ray, pleural effusion, an abnormal buildup of fluid in the lungs, can look very much like pulmonary infiltrates, which are accumulations of pus or blood.

An artificial intelligence model could assist the clinician in X-ray analysis by helping to identify subtle details and boosting the efficiency of the diagnosis process. But because so many possible conditions could be present in one image, the clinician would likely want to consider a set of possibilities, rather than only having one AI prediction to evaluate.

One promising way to produce a set of possibilities, called conformal classification, is convenient because it can be readily implemented on top of an existing machine-learning model. However, it can produce sets that are impractically large. 

MIT researchers have now developed a simple and effective improvement that can reduce the size of prediction sets by up to 30 percent while also making predictions more reliable.

Having a smaller prediction set may help a clinician zero in on the right diagnosis more efficiently, which could improve and streamline treatment for patients. This method could be useful across a range of classification tasks — say, for identifying the species of an animal in an image from a wildlife park — as it provides a smaller but more accurate set of options.

“With fewer classes to consider, the sets of predictions are naturally more informative in that you are choosing between fewer options. In a sense, you are not really sacrificing anything in terms of accuracy for something that is more informative,” says Divya Shanmugam PhD ’24, a postdoc at Cornell Tech who conducted this research while she was an MIT graduate student.

Shanmugam is joined on the paper by Helen Lu ’24; Swami Sankaranarayanan, a former MIT postdoc who is now a research scientist at Lilia Biosciences; and senior author John Guttag, the Dugald C. Jackson Professor of Computer Science and Electrical Engineering at MIT and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the Conference on Computer Vision and Pattern Recognition in June.

Prediction guarantees

AI assistants deployed for high-stakes tasks, like classifying diseases in medical images, are typically designed to produce a probability score along with each prediction so a user can gauge the model’s confidence. For instance, a model might predict that there is a 20 percent chance an image corresponds to a particular diagnosis, like pleurisy.

But it is difficult to trust a model’s predicted confidence because much prior research has shown that these probabilities can be inaccurate. With conformal classification, the model’s prediction is replaced by a set of the most probable diagnoses along with a guarantee that the correct diagnosis is somewhere in the set.

But the inherent uncertainty in AI predictions often causes the model to output sets that are far too large to be useful.

For instance, if a model is classifying an animal in an image as one of 10,000 potential species, it might output a set of 200 predictions so it can offer a strong guarantee.

“That is quite a few classes for someone to sift through to figure out what the right class is,” Shanmugam says.

The technique can also be unreliable because tiny changes to inputs, like slightly rotating an image, can yield entirely different sets of predictions.

To make conformal classification more useful, the researchers applied a technique developed to improve the accuracy of computer vision models called test-time augmentation (TTA).

TTA creates multiple augmentations of a single image in a dataset, perhaps by cropping the image, flipping it, zooming in, etc. Then it applies a computer vision model to each version of the same image and aggregates its predictions.

“In this way, you get multiple predictions from a single example. Aggregating predictions in this way improves predictions in terms of accuracy and robustness,” Shanmugam explains.

Maximizing accuracy

To apply TTA, the researchers hold out some labeled image data used for the conformal classification process. They learn to aggregate the augmentations on these held-out data, automatically augmenting the images in a way that maximizes the accuracy of the underlying model’s predictions.

Then they run conformal classification on the model’s new, TTA-transformed predictions. The conformal classifier outputs a smaller set of probable predictions for the same confidence guarantee.

“Combining test-time augmentation with conformal prediction is simple to implement, effective in practice, and requires no model retraining,” Shanmugam says.

Compared to prior work in conformal prediction across several standard image classification benchmarks, their TTA-augmented method reduced prediction set sizes across experiments, from 10 to 30 percent.

Importantly, the technique achieves this reduction in prediction set size while maintaining the probability guarantee.

The researchers also found that, even though they are sacrificing some labeled data that would normally be used for the conformal classification procedure, TTA boosts accuracy enough to outweigh the cost of losing those data.

“It raises interesting questions about how we used labeled data after model training. The allocation of labeled data between different post-training steps is an important direction for future work,” Shanmugam says.

In the future, the researchers want to validate the effectiveness of such an approach in the context of models that classify text instead of images. To further improve the work, the researchers are also considering ways to reduce the amount of computation required for TTA.

This research is funded, in part, by the Wistrom Corporation.

System lets robots identify an object’s properties through handling

A human clearing junk out of an attic can often guess the contents of a box simply by picking it up and giving it a shake, without the need to see what’s inside. Researchers from MIT, Amazon Robotics, and the University of British Columbia have taught robots to do something similar.

They developed a technique that enables robots to use only internal sensors to learn about an object’s weight, softness, or contents by picking it up and gently shaking it. With their method, which does not require external measurement tools or cameras, the robot can accurately guess parameters like an object’s mass in a matter of seconds.

This low-cost technique could be especially useful in applications where cameras might be less effective, such as sorting objects in a dark basement or clearing rubble inside a building that partially collapsed after an earthquake.

Key to their approach is a simulation process that incorporates models of the robot and the object to rapidly identify characteristics of that object as the robot interacts with it. 

The researchers’ technique is as good at guessing an object’s mass as some more complex and expensive methods that incorporate computer vision. In addition, their data-efficient approach is robust enough to handle many types of unseen scenarios.

“This idea is general, and I believe we are just scratching the surface of what a robot can learn in this way. My dream would be to have robots go out into the world, touch things and move things in their environments, and figure out the properties of everything they interact with on their own,” says Peter Yichen Chen, an MIT postdoc and lead author of a paper on this technique.

His coauthors include fellow MIT postdoc Chao Liu; Pingchuan Ma PhD ’25; Jack Eastman MEng ’24; Dylan Randle and Yuri Ivanov of Amazon Robotics; MIT professors of electrical engineering and computer science Daniela Rus, who leads MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL); and Wojciech Matusik, who leads the Computational Design and Fabrication Group within CSAIL. The research will be presented at the International Conference on Robotics and Automation.

Sensing signals

The researchers’ method leverages proprioception, which is a human or robot’s ability to sense its movement or position in space.

For instance, a human who lifts a dumbbell at the gym can sense the weight of that dumbbell in their wrist and bicep, even though they are holding the dumbbell in their hand. In the same way, a robot can “feel” the heaviness of an object through the multiple joints in its arm.

“A human doesn’t have super-accurate measurements of the joint angles in our fingers or the precise amount of torque we are applying to an object, but a robot does. We take advantage of these abilities,” Liu says.

As the robot lifts an object, the researchers’ system gathers signals from the robot’s joint encoders, which are sensors that detect the rotational position and speed of its joints during movement. 

Most robots have joint encoders within the motors that drive their moveable parts, Liu adds. This makes their technique more cost-effective than some approaches because it doesn’t need extra components like tactile sensors or vision-tracking systems.

To estimate an object’s properties during robot-object interactions, their system relies on two models: one that simulates the robot and its motion and one that simulates the dynamics of the object.

“Having an accurate digital twin of the real-world is really important for the success of our method,” Chen adds.

Their algorithm “watches” the robot and object move during a physical interaction and uses joint encoder data to work backward and identify the properties of the object.

For instance, a heavier object will move slower than a light one if the robot applies the same amount of force.

Differentiable simulations

They utilize a technique called differentiable simulation, which allows the algorithm to predict how small changes in an object’s properties, like mass or softness, impact the robot’s ending joint position. The researchers built their simulations using NVIDIA’s Warp library, an open-source developer tool that supports differentiable simulations.

Once the differentiable simulation matches up with the robot’s real movements, the system has identified the correct property. The algorithm can do this in a matter of seconds and only needs to see one real-world trajectory of the robot in motion to perform the calculations.

“Technically, as long as you know the model of the object and how the robot can apply force to that object, you should be able to figure out the parameter you want to identify,” Liu says.

The researchers used their method to learn the mass and softness of an object, but their technique could also determine properties like moment of inertia or the viscosity of a fluid inside a container.

Plus, because their algorithm does not need an extensive dataset for training like some methods that rely on computer vision or external sensors, it would not be as susceptible to failure when faced with unseen environments or new objects.

In the future, the researchers want to try combining their method with computer vision to create a multimodal sensing technique that is even more powerful.

“This work is not trying to replace computer vision. Both methods have their pros and cons. But here we have shown that without a camera we can already figure out some of these properties,” Chen says.

They also want to explore applications with more complicated robotic systems, like soft robots, and more complex objects, including sloshing liquids or granular media like sand.

In the long run, they hope to apply this technique to improve robot learning, enabling future robots to quickly develop new manipulation skills and adapt to changes in their environments.

“Determining the physical properties of objects from data has long been a challenge in robotics, particularly when only limited or noisy measurements are available. This work is significant because it shows that robots can accurately infer properties like mass and softness using only their internal joint sensors, without relying on external cameras or specialized measurement tools,” says Miles Macklin, senior director of simulation technology at NVIDIA, who was not involved with this research.

This work is funded, in part, by Amazon and the GIST-CSAIL Research Program.

Hybrid AI model crafts smooth, high-quality videos in seconds

What would a behind-the-scenes look at a video generated by an artificial intelligence model be like? You might think the process is similar to stop-motion animation, where many images are created and stitched together, but that’s not quite the case for “diffusion models” like OpenAl’s SORA and Google’s VEO 2.

Instead of producing a video frame-by-frame (or “autoregressively”), these systems process the entire sequence at once. The resulting clip is often photorealistic, but the process is slow and doesn’t allow for on-the-fly changes. 

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Adobe Research have now developed a hybrid approach, called “CausVid,” to create videos in seconds. Much like a quick-witted student learning from a well-versed teacher, a full-sequence diffusion model trains an autoregressive system to swiftly predict the next frame while ensuring high quality and consistency. CausVid’s student model can then generate clips from a simple text prompt, turning a photo into a moving scene, extending a video, or altering its creations with new inputs mid-generation.

This dynamic tool enables fast, interactive content creation, cutting a 50-step process into just a few actions. It can craft many imaginative and artistic scenes, such as a paper airplane morphing into a swan, woolly mammoths venturing through snow, or a child jumping in a puddle. Users can also make an initial prompt, like “generate a man crossing the street,” and then make follow-up inputs to add new elements to the scene, like “he writes in his notebook when he gets to the opposite sidewalk.”

A video produced by CausVid illustrates its ability to create smooth, high-quality content.
AI-generated animation courtesy of the researchers.

The CSAIL researchers say that the model could be used for different video editing tasks, like helping viewers understand a livestream in a different language by generating a video that syncs with an audio translation. It could also help render new content in a video game or quickly produce training simulations to teach robots new tasks.

Tianwei Yin SM ’25, PhD ’25, a recently graduated student in electrical engineering and computer science and CSAIL affiliate, attributes the model’s strength to its mixed approach.

“CausVid combines a pre-trained diffusion-based model with autoregressive architecture that’s typically found in text generation models,” says Yin, co-lead author of a new paper about the tool. “This AI-powered teacher model can envision future steps to train a frame-by-frame system to avoid making rendering errors.”

Yin’s co-lead author, Qiang Zhang, is a research scientist at xAI and a former CSAIL visiting researcher. They worked on the project with Adobe Research scientists Richard Zhang, Eli Shechtman, and Xun Huang, and two CSAIL principal investigators: MIT professors Bill Freeman and Frédo Durand.

Caus(Vid) and effect

Many autoregressive models can create a video that’s initially smooth, but the quality tends to drop off later in the sequence. A clip of a person running might seem lifelike at first, but their legs begin to flail in unnatural directions, indicating frame-to-frame inconsistencies (also called “error accumulation”).

Error-prone video generation was common in prior causal approaches, which learned to predict frames one by one on their own. CausVid instead uses a high-powered diffusion model to teach a simpler system its general video expertise, enabling it to create smooth visuals, but much faster.

CausVid displayed its video-making aptitude when researchers tested its ability to make high-resolution, 10-second-long videos. It outperformed baselines like “OpenSORA” and “MovieGen,” working up to 100 times faster than its competition while producing the most stable, high-quality clips.

Then, Yin and his colleagues tested CausVid’s ability to put out stable 30-second videos, where it also topped comparable models on quality and consistency. These results indicate that CausVid may eventually produce stable, hours-long videos, or even an indefinite duration.

A subsequent study revealed that users preferred the videos generated by CausVid’s student model over its diffusion-based teacher.

“The speed of the autoregressive model really makes a difference,” says Yin. “Its videos look just as good as the teacher’s ones, but with less time to produce, the trade-off is that its visuals are less diverse.”

CausVid also excelled when tested on over 900 prompts using a text-to-video dataset, receiving the top overall score of 84.27. It boasted the best metrics in categories like imaging quality and realistic human actions, eclipsing state-of-the-art video generation models like “Vchitect” and “Gen-3.

While an efficient step forward in AI video generation, CausVid may soon be able to design visuals even faster — perhaps instantly — with a smaller causal architecture. Yin says that if the model is trained on domain-specific datasets, it will likely create higher-quality clips for robotics and gaming.

Experts say that this hybrid system is a promising upgrade from diffusion models, which are currently bogged down by processing speeds. “[Diffusion models] are way slower than LLMs [large language models] or generative image models,” says Carnegie Mellon University Assistant Professor Jun-Yan Zhu, who was not involved in the paper. “This new work changes that, making video generation much more efficient. That means better streaming speed, more interactive applications, and lower carbon footprints.”

The team’s work was supported, in part, by the Amazon Science Hub, the Gwangju Institute of Science and Technology, Adobe, Google, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator. CausVid will be presented at the Conference on Computer Vision and Pattern Recognition in June.