Nonsense can make sense to machine-learning models

For all that neural networks can accomplish, we still don’t really understand how they operate. Sure, we can program them to learn, but making sense of a machine’s decision-making process remains much like a fancy puzzle with a dizzying, complex pattern where plenty of integral pieces have yet to be fitted. 

If a model was trying to classify an image of said puzzle, for example, it could encounter well-known, but annoying adversarial attacks, or even more run-of-the-mill data or processing issues. But a new, more subtle type of failure recently identified by MIT scientists is another cause for concern: “overinterpretation,” where algorithms make confident predictions based on details that don’t make sense to humans, like random patterns or image borders. 

This could be particularly worrisome for high-stakes environments, like split-second decisions for self-driving cars, and medical diagnostics for diseases that need more immediate attention. Autonomous vehicles in particular rely heavily on systems that can accurately understand surroundings and then make quick, safe decisions. The network used specific backgrounds, edges, or particular patterns of the sky to classify traffic lights and street signs — irrespective of what else was in the image. 

The team found that neural networks trained on popular datasets like CIFAR-10 and ImageNet suffered from overinterpretation. Models trained on CIFAR-10, for example, made confident predictions even when 95 percent of input images were missing, and the remainder is senseless to humans. 

“Overinterpretation is a dataset problem that’s caused by these nonsensical signals in datasets. Not only are these high-confidence images unrecognizable, but they contain less than 10 percent of the original image in unimportant areas, such as borders. We found that these images were meaningless to humans, yet models can still classify them with high confidence,” says Brandon Carter, MIT Computer Science and Artificial Intelligence Laboratory PhD student and lead author on a paper about the research. 

Deep-image classifiers are widely used. In addition to medical diagnosis and boosting autonomous vehicle technology, there are use cases in security, gaming, and even an app that tells you if something is or isn’t a hot dog, because sometimes we need reassurance. The tech in discussion works by processing individual pixels from tons of pre-labeled images for the network to “learn.” 

Image classification is hard, because machine-learning models have the ability to latch onto these nonsensical subtle signals. Then, when image classifiers are trained on datasets such as ImageNet, they can make seemingly reliable predictions based on those signals. 

Although these nonsensical signals can lead to model fragility in the real world, the signals are actually valid in the datasets, meaning overinterpretation can’t be diagnosed using typical evaluation methods based on that accuracy. 

To find the rationale for the model’s prediction on a particular input, the methods in the present study start with the full image and repeatedly ask, what can I remove from this image? Essentially, it keeps covering up the image, until you’re left with the smallest piece that still makes a confident decision. 

To that end, it could also be possible to use these methods as a type of validation criteria. For example, if you have an autonomously driving car that uses a trained machine-learning method for recognizing stop signs, you could test that method by identifying the smallest input subset that constitutes a stop sign. If that consists of a tree branch, a particular time of day, or something that’s not a stop sign, you could be concerned that the car might come to a stop at a place it’s not supposed to.

While it may seem that the model is the likely culprit here, the datasets are more likely to blame. “There’s the question of how we can modify the datasets in a way that would enable models to be trained to more closely mimic how a human would think about classifying images and therefore, hopefully, generalize better in these real-world scenarios, like autonomous driving and medical diagnosis, so that the models don’t have this nonsensical behavior,” says Carter. 

This may mean creating datasets in more controlled environments. Currently, it’s just pictures that are extracted from public domains that are then classified. But if you want to do object identification, for example, it might be necessary to train models with objects with an uninformative background. 

This work was supported by Schmidt Futures and the National Institutes of Health. Carter wrote the paper alongside Siddhartha Jain and Jonas Mueller, scientists at Amazon, and MIT Professor David Gifford. They are presenting the work at the 2021 Conference on Neural Information Processing Systems.

Giving bug-like bots a boost

When it comes to robots, bigger isn’t always better. Someday, a swarm of insect-sized robots might pollinate a field of crops or search for survivors amid the rubble of a collapsed building.

MIT researchers have demonstrated diminutive drones that can zip around with bug-like agility and resilience, which could eventually perform these tasks. The soft actuators that propel these microrobots are very durable, but they require much higher voltages than similarly-sized rigid actuators. The featherweight robots can’t carry the necessary power electronics that would allow them fly on their own.

Now, these researchers have pioneered a fabrication technique that enables them to build soft actuators that operate with 75 percent lower voltage than current versions while carrying 80 percent more payload. These soft actuators are like artificial muscles that rapidly flap the robot’s wings.

This new fabrication technique produces artificial muscles with fewer defects, which dramatically extends the lifespan of the components and increases the robot’s performance and payload.   

“This opens up a lot of opportunity in the future for us to transition to putting power electronics on the microrobot. People tend to think that soft robots are not as capable as rigid robots. We demonstrate that this robot, weighing less than a gram, flies for the longest time with the smallest error during a hovering flight. The take-home message is that soft robots can exceed the performance of rigid robots,” says Kevin Chen, who is the D. Reid Weedon, Jr. ’41 assistant professor in the Department of Electrical Engineering and Computer Science, the head of the Soft and Micro Robotics Laboratory in the Research Laboratory of Electronics (RLE), and the senior author of the paper.

Chen’s coauthors include Zhijian Ren and Suhan Kim, co-lead authors and EECS graduate students; Xiang Ji, a research scientist in EECS; Weikun Zhu, a chemical engineering graduate student; Farnaz Niroui, an assistant professor in EECS; and Jing Kong, a professor in EECS and principal investigator in RLE. The research has been accepted for publication in Advanced Materials and is included in the jounal’s Rising Stars series, which recognizes outstanding works from early-career researchers.


Making muscles

The rectangular microrobot, which weighs less than one-fourth of a penny, has four sets of wings that are each driven by a soft actuator. These muscle-like actuators are made from layers of elastomer that are sandwiched between two very thin electrodes and then rolled into a squishy cylinder. When voltage is applied to the actuator, the electrodes squeeze the elastomer, and that mechanical strain is used to flap the wing.

The more surface area the actuator has, the less voltage is required. So, Chen and his team build these artificial muscles by alternating between as many ultrathin layers of elastomer and electrode as they can. As elastomer layers get thinner, they become more unstable.

For the first time, the researchers were able to create an actuator with 20 layers, each of which is 10 micrometers in thickness (about the diameter of a red blood cell). But they had to reinvent parts of the fabrication process to get there.

One major roadblock came from the spin coating process. During spin coating, an elastomer is poured onto a flat surface and rapidly rotated, and the centrifugal force pulls the film outward to make it thinner.

“In this process, air comes back into the elastomer and creates a lot of microscopic air bubbles. The diameter of these air bubbles is barely 1 micrometer, so previously we just sort of ignored them. But when you get thinner and thinner layers, the effect of the air bubbles becomes stronger and stronger. That is traditionally why people haven’t been able to make these very thin layers,” Chen explains.

He and his collaborators found that if they perform a vacuuming process immediately after spin coating, while the elastomer was still wet, it removes the air bubbles. Then, they bake the elastomer to dry it.

Removing these defects increases the power output of the actuator by more than 300 percent and significantly improves its lifespan, Chen says.

The researchers also optimized the thin electrodes, which are composed of carbon nanotubes, super-strong rolls of carbon that are about 1/50,000 the diameter of human hair. Higher concentrations of carbon nanotubes increase the actuator’s power output and reduce voltage, but dense layers also contain more defects.

For instance, the carbon nanotubes have sharp ends and can pierce the elastomer, which causes the device to short out, Chen explains. After much trial and error, the researchers found the optimal concentration.

Another problem comes from the curing stage — as more layers are added, the actuator takes longer and longer to dry.

“The first time I asked my student to make a multilayer actuator, once he got to 12 layers, he had to wait two days for it to cure. That is totally not sustainable, especially if you want to scale up to more layers,” Chen says.

They found that baking each layer for a few minutes immediately after the carbon nanotubes are transferred to the elastomer cuts down the curing time as more layers are added.

Best-in-class performance

After using this technique to create a 20-layer artificial muscle, they tested it against their previous six-layer version and state-of-the-art, rigid actuators.

During liftoff experiments, the 20-layer actuator, which requires less than 500 volts to operate, exerted enough power to give the robot a lift-to-weight ratio of 3.7 to 1, so it could carry items that are nearly three times its weight.

They also demonstrated a 20-second hovering flight, which Chen says is the longest ever recorded by a sub-gram robot. Their hovering robot held its position more stably than any of the others. The 20-layer actuator was still working smoothly after being driven for more than 2 million cycles, far outpacing the lifespan of other actuators.

“Two years ago, we created the most power-dense actuator and it could barely fly. We started to wonder, can soft robots ever compete with rigid robots? We observed one defect after another, so we kept working and we solved one fabrication problem after another, and now the soft actuator’s performance is catching up. They are even a little bit better than the state-of-the-art rigid ones. And there are still a number of fabrication processes in material science that we don’t understand. So, I am very excited to continue to reduce actuation voltage,” he says.

Chen looks forward to collaborating with Niroui to build actuators in a clean room at MIT.nano and leverage nanofabrication techniques. Now, his team is limited to how thin they can make the layers due to dust in the air and a maximum spin coating speed. Working in a clean room eliminates this problem and would allow them to use methods, such as doctor blading, that are more precise than spin coating.

While Chen is thrilled about producing 10-micrometer actuator layers, his hope is to reduce the thickness to only 1 micrometer, which would open the door to many applications for these insect-sized robots.

This work is supported, in part, by the MIT Research Laboratory of Electronics and a Mathworks Graduate Fellowship.

Meet the 2021-22 Accenture Fellows

Launched in October of 2020, the MIT and Accenture Convergence Initiative for Industry and Technology underscores the ways in which industry and technology come together to spur innovation. The five-year initiative aims to achieve its mission through research, education, and fellowships. To that end, Accenture has once again awarded five annual fellowships to MIT graduate students working on research in industry and technology convergence who are underrepresented, including by race, ethnicity, and gender.

This year’s Accenture Fellows work across disciplines including robotics, manufacturing, artificial intelligence, and biomedicine. Their research covers a wide array of subjects, including: advancing manufacturing through computational design, with the potential to benefit global vaccine production; designing low-energy robotics for both consumer electronics and the aerospace industry; developing robotics and machine learning systems that may aid the elderly in their homes; and creating ingestible biomedical devices that can help gather medical data from inside a patient’s body.

Student nominations from each unit within the School of Engineering, as well as from the four other MIT schools and the MIT Schwarzman College of Computing, were invited as part of the application process. Five exceptional students were selected as fellows in the initiative’s second year.

Xinming (Lily) Liu is a PhD student in operations research at MIT Sloan School of Management. Her work is focused on behavioral and data-driven operations for social good, incorporating human behaviors into traditional optimization models, designing incentives, and analyzing real-world data. Her current research looks at the convergence of social media, digital platforms, and agriculture, with particular attention to expanding technological equity and economic opportunity in developing countries. Liu earned her BS from Cornell University, with a double major in operations research and computer science.

Caris Moses is a PhD student in electrical engineering and computer science specializing in artificial intelligence. Moses’ research focuses on using machine learning, optimization, and electromechanical engineering to build robotics systems that are robust, flexible, intelligent, and can learn on the job. The technology she is developing holds promise for industries including flexible, small-batch manufacturing; robots to assist the elderly in their households; and warehouse management and fulfillment. Moses earned her BS in mechanical engineering from Cornell University and her MS in computer science from Northeastern University.

Sergio Rodriguez Aponte is a PhD student in biological engineering. He is working on the convergence of computational design and manufacturing practices, which have the potential to impact industries such as biopharmaceuticals, food, and wellness/nutrition. His current research aims to develop strategies for applying computational tools, such as multiscale modeling and machine learning, to the design and production of manufacturable and accessible vaccine candidates that could eventually be available globally. Rodriguez Aponte earned his BS in industrial biotechnology from the University of Puerto Rico at Mayaguez.

Soumya Sudhakar SM ’20 is a PhD student in aeronautics and astronautics. Her work is focused on the co-design of new algorithms and integrated circuits for autonomous low-energy robotics that could have novel applications in aerospace and consumer electronics. Her contributions bring together the emerging robotics industry, integrated circuits industry, aerospace industry, and consumer electronics industry. Sudhakar earned her BSE in mechanical and aerospace engineering from Princeton University and her MS in aeronautics and astronautics from MIT.

So-Yoon Yang is a PhD student in electrical engineering and computer science. Her work on the development of low-power, wireless, ingestible biomedical devices for health care is at the intersection of the medical device, integrated circuit, artificial intelligence, and pharmaceutical fields. Currently, the majority of wireless biomedical devices can only provide a limited range of medical data measured from outside the body. Ingestible devices hold promise for the next generation of personal health care because they do not require surgical implantation, can be useful for detecting physiological and pathophysiological signals, and can also function as therapeutic alternatives when treatment cannot be done externally. Yang earned her BS in electrical and computer engineering from Seoul National University in South Korea and her MS in electrical engineering from Caltech.

Goldwasser recognized with FOCS Test of Time award

MIT EECS professor Shafi Goldwasser has been recognized with the FOCS Test of Time award for her paper “Approximating Clique is Almost NP-Complete.” The award recognizes papers from 1991 that have stood the test of time. 

Through her many years of research, Goldwasser has laid much of the groundwork for the field of cryptography, and made fundamental contributions to computational complexity, computational number theory and probabilistic algorithms

Goldwasser is the RSA Professor (post-tenure) of Computer Science and Engineering at MIT, a co-leader of the cryptography and information security group and a member of the complexity theory group within the Theory of Computation Group and the Computer Science and Artificial Intelligence Laboratory (CSAIL). 

Goldwasser was awarded the prestigious A.M. Turing Award in 2013, the Simons Foundation Investigator Award in 2012, the IEEE Emanuel R. Priore Award in 2011, the Franklin Institute Benjamin Franklin Medal in Computer and Cognitive Science in 2010, as well as many other recognitions. She is Director of the Simons Institute for the Theory of Computing, Professor in Electrical Engineering and Computer Sciences at University of California Berkeley; and Professor of Computer Science and Applied Mathematics at Weizmann Institute, Israel. 

The award will be presented at FOCS 2021, which will take place February 2022. 

Five in EECS Appointed to Career Development Professorships

The Department of Electrical Engineering and Computer Science has announced the appointment of five of its faculty members to Career Development Professorships, retroactively effective July 1, 2021. Those appointments are as follows:

Dylan Hadfield-Menell has been appointed Bonnie and Marty (1964) Tenenbaum Career Development Assistant Professor. Hadfield-Menell’s research focuses on algorithms that facilitate human-compatible artificial intelligence. He aims to develop frameworks that account for uncertainty about the objective being optimized.  Hadfield-Menell is affiliated with CSAIL; he earned his PhD from the University of California Berkeley and his undergraduate degree from MIT.

Yoon Kim has been appointed NBX Career Development Assistant Professor. Kim’s work straddles the intersection between natural language processing and machine learning, and touches upon efficient training and deployment of large-scale models, learning from small data, neuro-symbolic approaches, grounded language learning, and connections between computational and human language processing. Affiliated with CSAIL, Kim earned his PhD in computer science at Harvard University; his MS in Data Science from New York University; his MA in Statistics from Columbia University; and his BA in both Math and Economics from Cornell.

Anand Venkat Natarajan has been appointed ITT Career Development Associate Professor in Computer Technology. Natarajan’s research is in theoretical quantum information, particularly nonlocality, quantum complexity theory, and semidefinite programming hierarchies. Natarajan earned his PhD in Physics from MIT, and an MS in Computer Science and BS in Physics from Stanford University. Prior to joining MIT, he spent time as a postdoc at the Institute for Quantum Information and Matter at Caltech; he is affiliated with CSAIL.

Jelena Notaros has been appointed Robert J. Shillman (1974) Career Development Assistant Professor in EECS. Notaros’s research interests are in integrated silicon photonics devices, systems, and applications, with an emphasis on augmented-reality displays, LiDAR sensing for autonomous vehicles, free-space optical communications, quantum engineering, and biophotonics. Affiliated with RLE and MTL, Notaros earned her her Ph.D. and MS degrees from MIT, and her undergraduate degree from the University of Colorado Boulder.

Tess Smidt has been appointed X-Window Consortium Career Development Assistant Professor. Affiliated with RLE, Smidt earned SB in Physics from MIT and her PhD in Physics from the University of California, Berkeley. Her research focuses on machine learning that incorporates physical and geometric constraints, with applications to materials design. Prior to joining the MIT EECS faculty, she was the 2018 Alvarez Postdoctoral Fellow in Computing Sciences at Lawrence Berkeley National Laboratory and a Software Engineering Intern on the Google Accelerated Sciences team.

One Step Forward: an interview with Mitchell Scholar Adedolapo Adedokun

Adedolapo Adedokun has a lot to look forward to in 2023. After completing his degree in electrical engineering and computer science next spring, he will travel to Ireland to undertake a MSc in intelligent systems at Trinity College Dublin as MIT’s fourth student to receive the prestigious George J. Mitchell Scholarship. But there’s more to Adedokun, who goes by Dolapo, than just academic achievement. Besides being a talented computer scientist, the senior is an accomplished musician, an influential member of student government–and an anime fan. We sat down with him to learn more.

First off, congratulations on this fantastic honor! What excites you the most about going to Ireland to study for a year? Have you already started making plans for the things you want to see and experience? What’s on the top of your Ireland list?

One of the reasons I was interested in Ireland was when I learned about Music Generation, a national music education initiative in Ireland, with the goal of giving every child in Ireland access to the arts through access to music tuition, performance opportunities, and music education in and outside of the classroom. It made me think, “Wow, this is a country that recognizes the importance of arts and music education and has invested to make it accessible for people of all backgrounds”. I am inspired by this initiative and wish it was something I could have had growing up.

I am also really inspired by the work of Louis Stewart, an amazing jazz guitarist who was born and raised in Dublin. I am excited to explore his musical influences and to dive into the rich musical community of Dublin. I hope to join a jazz band, maybe a trio or a quartet, and perform all around the city, immersing myself in the rich Irish musical scene but also sharing my own styles and musical influences with the community there.

Of course, while you’re there, you’ll be working on your MSc in Intelligent Systems. I am probably not the only one who read the announcement of your win and was particularly intrigued by your invention of a “smart-home system that allowed users to anonymously layer different melodies as they entered and left a building, which created a unique and rich soundtrack for each day”. Tell us a little more about that system: how it works, how you envision users interacting with it and experiencing it, and what you learned from developing it.

Funny enough, it actually started as a system I worked on in my freshman year in 6.08 – Introduction to Embedded Systems with a few classmates. We called it Smart HOMiE, an IoT Arduino smart-home device that gathered basic information like location, weather and interfaced with Amazon Alexa. I had forgotten about having worked on it until I took 21M.080 and 6.033 Computer System Engineering in my junior year, and began to learn about the creative applications of machine learning and computer science in areas like audio synthesis and digital instrument design. I learned about amazing projects like Google Magenta’s Tone Transfer ML–models that use machine learning models to transform sounds into legitimate musical instruments. Learning about this unique intersection combining music and technology, I began to think about bigger questions, like, “What kind of creative future can technology create? How can technology enable anyone to be expressive?”

Adedokun, who plays jazz guitar, has translated his talent for improvisation into his creative work with audio synthesis technology. Photo courtesy of the subject.

When I had some downtime while being at home for a year, I wanted to play around with some of the audio synthesis tools I had learned about. I took Smart HOMiE and upgraded it a bit––made it a bit more musical. It worked in three main steps. First, multiple people could sing and record melodies that the device would save and store. Then, using a few pitch correction and audio synthesis python libraries, Smart HOMiE corrected the recorded melodies until they fit together, or generally fit inside the same key, in music terms. Lastly, it then would combine the melodies, add some harmony or layer the track over a backing track, and by the end, you’ve made something really unique and expressive. It was definitely a bit scrappy, but it was one of my first times messing around and exploring all the work that has already been done by amazing people in this space. Technology has this incredible potential to make anyone a creator––I’d like to build the tools to make it happen.

What kind of creative future can technology create? How can technology enable anyone to be expressive?

Dolapo Adedokun

You’re a jazz instrumentalist yourself—tell us a little about your chosen instrument, how you developed that skill, and what practicing and performing have taught you.

I’ve always had an affinity for music, but haven’t always felt like I could become a musician. I had played saxophone in middle school but it never really stuck. When I got to MIT, I was fortunate enough to take 21M.051 – Fundamentals of Music and dive into proper music theory for the first time. It was in that class that I was exposed to jazz and completely fell in love. I’ll never forget walking back to New House from Barker Library in my freshman year and stumbling upon Undercurrent by Bill Evans and Jim Hall––I think that was when I decided I wanted to learn jazz guitar.

Jazz, and in particular, improvisation has taught me so much about what it means to be creative: to be willing to experiment, take risks, build upon the work of others, and accept failure––all skills that I wholeheartedly believe have made me a better technologist and leader. Most importantly though I think music and jazz have taught me patience and discipline, and that mastery of a skill takes a lifetime. I’d be lying if I said I was satisfied with where I am currently at, but each day, I’m eager to take one step forward towards my goals.

You’ve focused in on music and arts education, and the potential of technology to bolster both. Do you have a particularly influential class, technology, or teacher in your past that you can point to as a change-maker in your life?

Wow, tough question! I think there are a few inflection points that have really been change-makers for me. The first was in high school when I first learned about Guitar Hero, the music rhythm video game that started as a project in the MIT Media Lab attempting to bring the joy of music-making to people of all backgrounds. It was then that I was able to see the multi-disciplinary outreach of technology in service of others.

The next I would say was taking 6.033 – Computer System Engineering at MIT. From the first day of class, Professor LaCurts emphasized understanding the people we design for. That we ought to see system design as inherently people-oriented–– before we think of designing a system, we must first consider the people that will be using them. We must consider their goals, their personas, their backgrounds, the barriers that they face, and most importantly, the consequences of our design and implementation choices. I envision a future where music, arts, and the creative process are accessible to everyone, and I believe 6.033 has given me the foundation to build the technology to reach that goal.

You’ve also developed a passion for broadband infrastructure, which at first glance, people might not connect with music and education, your other two focuses. Tell us a little about why broadband is such an important factor.

Before we can think about the potential of technology to democratize accessibility to music and the arts, we first have to take a step back and think about accessibility. What communities have more and less access to the proper technology that we often take for granted? I think broadband is just one factor in the realm of the bigger problem which is accessibility, particularly in minority and low-income communities. I see technology as being the key to democratizing access to music and the arts for people of all background–but that technology can only be the key if the foundational infrastructure is in place for all people to take advantage of it. Just like I learned in 6.033, that means understanding the barriers of the people and communities with the least access and investing in crucial, basic technological resources like equitable broadband internet access.

Between your work on the Undergraduate Student Advisory Group in EECS, the Harvard/MIT Cooperative Society, the MIT Chapter of the National Society of Black Engineers, and of course all your research and many academic interests, many readers must wonder if you ever eat or sleep! Tell me a little about how you’ve balanced out your busy MIT life and maintained a sense of self while accomplishing so much in undergrad.

Great question! I’ll start by saying it took me a while to figure out. There were semesters where I had to drop classes and or drop extra-curricular commitments to find some sense of balance. It’s always difficult, being surrounded by the world’s brightest students who are all doing incredible and amazing things, to not feel like you should add one more class or an extra UROP.

I think the most important thing though is to stay true to you––figuring out the things that bring you joy, that excite you, and how much of those commitments is reasonable to take on each semester. I’m not a student who can take a million and one classes, research, internships, and clubs all at the same time––but that’s totally okay. It took me a while, to find the things I enjoyed, and understand the academic load that’s appropriate for me each semester, but once I did, I was happier than ever before. I realized things like playing tennis and basketball, jamming with friends, and even sneaking in a few episodes of anime here and there are really important to me. As long as I can look back each week, month, semester, and year and say I’ve taken a step forward towards my academic, social, and music goals, even just the tiniest amount, then I think I am taking steps in the right direction.

Tiny machine learning design alleviates a bottleneck in memory usage on internet-of-things devices

Machine learning provides powerful tools to researchers to identify and predict patterns and behaviors, as well as learn, optimize, and perform tasks. This ranges from applications like vision systems on autonomous vehicles or social robots to smart thermostats to wearable and mobile devices like smartwatches and apps that can monitor health changes. While these algorithms and their architectures are becoming more powerful and efficient, they typically require tremendous amounts of memory, computation, and data to train and make inferences.

At the same time, researchers are working to reduce the size and complexity of the devices that these algorithms can run on, all the way down to a microcontroller unit (MCU) that’s found in billions of internet-of-things (IoT) devices. An MCU is memory-limited minicomputer housed in compact integrated circuit that lacks an operating system and runs simple commands. These relatively cheap edge devices require low power, computing, and bandwidth, and offer many opportunities to inject AI technology to expand their utility, increase privacy, and democratize their use — a field called TinyML.

Now, an MIT team working in TinyML in the MIT-IBM Watson AI Lab and the research group of Song Han, assistant professor in the Department of Electrical Engineering and Computer Science (EECS), has designed a technique to shrink the amount of memory needed even smaller, while improving its performance on image recognition in live videos.

“Our new technique can do a lot more and paves the way for tiny machine learning on edge devices,” says Han, who designs TinyML software and hardware.

To increase TinyML efficiency, Han and his colleagues from EECS and the MIT-IBM Watson AI Lab analyzed how memory is used on microcontrollers running various convolutional neural networks (CNNs). CNNs are biologically-inspired models after neurons in the brain and are often applied to evaluate and identify visual features within imagery, like a person walking through a video frame. In their study, they discovered an imbalance in memory utilization, causing front-loading on the computer chip and creating a bottleneck. By developing a new inference technique and neural architecture, the team alleviated the problem and reduced peak memory usage by four-to-eight times. Further, the team deployed it on their own tinyML vision system, equipped with a camera and capable of human and object detection, creating its next generation, dubbed MCUNetV2. When compared to other machine learning methods running on microcontrollers, MCUNetV2 outperformed them with high accuracy on detection, opening the doors to additional vision applications not before possible.

The results will be presented in a paper at the conference on Neural Information Processing Systems (NeurIPS) this week. The team includes Han, lead author and graduate student Ji Lin, postdoc Wei-Ming Chen, graduate student Han Cai, and MIT-IBM Watson AI Lab Research Scientist Chuang Gan.

A design for memory efficiency and redistribution

TinyML offers numerous advantages over deep machine learning that happens on larger devices, like remote servers and smartphones. These, Han notes, include privacy, since the data are not transmitted to the cloud for computing but processed on the local device; robustness, as the computing is quick and the latency is low; and low cost, because IoT devices cost roughly $1 to $2. Further, some larger, more traditional AI models can emit as much carbon as five cars in their lifetimes, require many GPUs, and cost billions of dollars to train. “So, we believe such TinyML techniques can enable us to go off-grid to save the carbon emissions and make the AI greener, smarter, faster, and also more accessible to everyone — to democratize AI,” says Han.

However, small MCU memory and digital storage limit AI applications, so efficiency is a central challenge. MCUs contain only 256 kilobytes of memory and 1 megabyte of storage. In comparison, mobile AI on smartphones and cloud computing, correspondingly, may have 256 gigabytes and terabytes of storage, as well as 16,000 and 100,000 times more memory. As a precious resource, the team wanted to optimize its use, so they profiled the MCU memory usage of CNN designs — a task that had been overlooked until now, Lin and Chen say.

Their findings revealed that the memory usage peaked by the first five convolutional blocks out of about 17. Each block contains many connected convolutional layers, which help to filter for the presence of specific features within an input image or video, creating a feature map as the output. During the initial memory-intensive stage, most of the blocks operated beyond the 256KB memory constraint, offering plenty of room for improvement. To reduce the peak memory, the researchers developed a patch-based inference schedule, which operates on only a small fraction, roughly 25 percent, of the layer’s feature map at one time, before moving onto the next quarter, until the whole layer is done. This method saved four-to-eight times the memory of the previous layer-by-layer computational method, without any latency.

“As an illustration, say we have a pizza. We can divide it into four chunks and only eat one chunk at a time, so you save about three-quarters. This is the patch-based inference method,” says Han. “However, this was not a free lunch.” Like photoreceptors in the human eye, they can only take in and examine part of an image at a time; this receptive field is a patch of the total image or field of view. As the size of these receptive fields (or pizza slices in this analogy) grows, there becomes increasing overlap, which amounts to redundant computation that the researchers found to be about 10 percent. The researchers proposed to also redistribute the neural network across the blocks, in parallel with the patch-based inference method, without losing any of the accuracy in the vision system. However, the question remained about which blocks needed the patch-based inference method and which could use the original layer-by-layer one, together with the redistribution decisions; hand-tuning for all of these knobs was labor-intensive, and better left to AI.

“We want to automate this process by doing a joint automated search for optimization, including both the neural network architecture, like the number of layers, number of channels, the kernel size, and also the inference schedule including number of patches, number of layers for patch-based inference, and other optimization knobs,” says Lin, “so that non-machine learning experts can have a push-button solution to improve the computation efficiency but also improve the engineering productivity, to be able to deploy this neural network on microcontrollers.”

A new horizon for tiny vision systems

The co-design of the network architecture with the neural network search optimization and inference scheduling provided significant gains and was adopted into MCUNetV2; it outperformed other vision systems in peak memory usage, and image and object detection and classification. The MCUNetV2 device includes a small screen, a camera, and is about the size of an earbud case. Compared to the first version, the new version needed four times less memory for the same amount of accuracy, says Chen. When placed head-to-head against other tinyML solutions, MCUNetV2 was able to detect the presence of objects in image frames, like human faces, with an improvement of nearly 17 percent. Further, it set a record for accuracy, at nearly 72 percent, for a thousand-class image classification on the ImageNet dataset, using 465KB of memory. The researchers tested for what’s known as visual wake words, how well their MCU vision model could identify the presence of a person within an image, and even with the limited memory of only 30KB, it achieved greater than 90 percent accuracy, beating the previous state-of-the-art method. This means the method is accurate enough and could be deployed to help in, say, smart-home applications.

With the high accuracy and low energy utilization and cost, MCUNetV2’s performance unlocks new IoT applications. Due to their limited memory, Han says, vision systems on IoT devices were previously thought to be only good for basic image classification tasks, but their work has helped to expand the opportunities for TinyML use. Further, the research team envisions it in numerous fields, from monitoring sleep and joint movement in the health-care industry to sports coaching and movements like a golf swing to plant identification in agriculture, as well as in smarter manufacturing, from identifying nuts and bolts to detecting malfunctioning machines.

“We really push forward for these larger-scale, real-world applications,” says Han. “Without GPUs or any specialized hardware, our technique is so tiny it can run on these small cheap IoT devices and perform real-world applications like these visual wake words, face mask detection, and person detection. This opens the door for a brand-new way of doing tiny AI and mobile vision.”

This research was sponsored by the MIT-IBM Watson AI Lab, Samsung, and Woodside Energy, and the National Science Foundation.

Machines that see the world more like humans do

Nine photos show different series of household items, such as a wrench and a box of sugar, clustered in different orientations towards each other.

Computer vision systems sometimes make inferences about a scene that fly in the face of common sense. For example, if a robot were processing a scene of a dinner table, it might completely ignore a bowl that is visible to any human observer, estimate that a plate is floating above the table, or misperceive a fork to be penetrating a bowl rather than leaning against it.

Move that computer vision system to a self-driving car and the stakes become much higher  — for example, such systems have failed to detect emergency vehicles and pedestrians crossing the street.

To overcome these errors, MIT researchers have developed a framework that helps machines see the world more like humans do. Their new artificial intelligence system for analyzing scenes learns to perceive real-world objects from just a few images, and perceives scenes in terms of these learned objects.

The researchers built the framework using probabilistic programming, an AI approach that enables the system to cross-check detected objects against input data, to see if the images recorded from a camera are a likely match to any candidate scene. Probabilistic inference allows the system to infer whether mismatches are likely due to noise or to errors in the scene interpretation that need to be corrected by further processing.

This common-sense safeguard allows the system to detect and correct many errors that plague the “deep-learning” approaches that have also been used for computer vision. Probabilistic programming also makes it possible to infer probable contact relationships between objects in the scene, and use common-sense reasoning about these contacts to infer more accurate positions for objects.

“If you don’t know about the contact relationships, then you could say that an object is floating above the table — that would be a valid explanation. As humans, it is obvious to us that this is physically unrealistic and the object resting on top of the table is a more likely pose of the object. Because our reasoning system is aware of this sort of knowledge, it can infer more accurate poses. That is a key insight of this work,” says lead author Nishad Gothoskar, an electrical engineering and computer science (EECS) PhD student with the Probabilistic Computing Project.

In addition to improving the safety of self-driving cars, this work could enhance the performance of computer perception systems that must interpret complicated arrangements of objects, like a robot tasked with cleaning a cluttered kitchen.

Gothoskar’s co-authors include recent EECS PhD graduate Marco Cusumano-Towner; research engineer Ben Zinberg; visiting student Matin Ghavamizadeh; Falk Pollok, a software engineer in the MIT-IBM Watson AI Lab; recent EECS master’s graduate Austin Garrett; Dan Gutfreund, a principal investigator in the MIT-IBM Watson AI Lab; Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences (BCS) and a member of the Computer Science and Artificial Intelligence Laboratory; and senior author Vikash K. Mansinghka, principal research scientist and leader of the Probabilistic Computing Project in BCS. The research is being presented at the Conference on Neural Information Processing Systems in December.

A blast from the past

To develop the system, called “3D Scene Perception via Probabilistic Programming (3DP3),” the researchers drew on a concept from the early days of AI research, which is that computer vision can be thought of as the “inverse” of computer graphics.

Computer graphics focuses on generating images based on the representation of a scene; computer vision can be seen as the inverse of this process. Gothoskar and his collaborators made this technique more learnable and scalable by incorporating it into a framework built using probabilistic programming.

“Probabilistic programming allows us to write down our knowledge about some aspects of the world in a way a computer can interpret, but at the same time, it allows us to express what we don’t know, the uncertainty. So, the system is able to automatically learn from data and also automatically detect when the rules don’t hold,” Cusumano-Towner explains.

In this case, the model is encoded with prior knowledge about 3D scenes. For instance, 3DP3 “knows” that scenes are composed of different objects, and that these objects often lay flat on top of each other — but they may not always be in such simple relationships. This enables the model to reason about a scene with more common sense.

Learning shapes and scenes

To analyze an image of a scene, 3DP3 first learns about the objects in that scene. After being shown only five images of an object, each taken from a different angle, 3DP3 learns the object’s shape and estimates the volume it would occupy in space.

“If I show you an object from five different perspectives, you can build a pretty good representation of that object. You’d understand its color, its shape, and you’d be able to recognize that object in many different scenes,” Gothoskar says.

Mansinghka adds, “This is way less data than deep-learning approaches. For example, the Dense Fusion neural object detection system requires thousands of training examples for each object type. In contrast, 3DP3 only requires a few images per object, and reports uncertainty about the parts of each objects’ shape that it doesn’t know.”

The 3DP3 system generates a graph to represent the scene, where each object is a node and the lines that connect the nodes indicate which objects are in contact with one another. This enables 3DP3 to produce a more accurate estimation of how the objects are arranged. (Deep-learning approaches rely on depth images to estimate object poses, but these methods don’t produce a graph structure of contact relationships, so their estimations are less accurate.)

Outperforming baseline models

The researchers compared 3DP3 with several deep-learning systems, all tasked with estimating the poses of 3D objects in a scene.

In nearly all instances, 3DP3 generated more accurate poses than other models and performed far better when some objects were partially obstructing others. And 3DP3 only needed to see five images of each object, while each of the baseline models it outperformed needed thousands of images for training.

When used in conjunction with another model, 3DP3 was able to improve its accuracy. For instance, a deep-learning model might predict that a bowl is floating slightly above a table, but because 3DP3 has knowledge of the contact relationships and can see that this is an unlikely configuration, it is able to make a correction by aligning the bowl with the table.

“I found it surprising to see how large the errors from deep learning could sometimes be — producing scene representations where objects really didn’t match with what people would perceive. I also found it surprising that only a little bit of model-based inference in our causal probabilistic program was enough to detect and fix these errors. Of course, there is still a long way to go to make it fast and robust enough for challenging real-time vision systems — but for the first time, we’re seeing probabilistic programming and structured causal models improving robustness over deep learning on hard 3D vision benchmarks,” Mansinghka says.

In the future, the researchers would like to push the system further so it can learn about an object from a single image, or a single frame in a movie, and then be able to detect that object robustly in different scenes. They would also like to explore the use of 3DP3 to gather training data for a neural network. It is often difficult for humans to manually label images with 3D geometry, so 3DP3 could be used to generate more complex image labels.

The 3DP3 system “combines low-fidelity graphics modeling with common-sense reasoning to correct large scene interpretation errors made by deep learning neural nets. This type of approach could have broad applicability as it addresses important failure modes of deep learning. The MIT researchers’ accomplishment also shows how probabilistic programming technology previously developed under DARPA’s Probabilistic Programming for Advancing Machine Learning (PPAML) program can be applied to solve central problems of common-sense AI under DARPA’s current Machine Common Sense (MCS) program,” says Matt Turek, DARPA Program Manager for the Machine Common Sense Program, who was not involved in this research, though the program partially funded the study.

Additional funders include the Singapore Defense Science and Technology Agency collaboration with the MIT Schwarzman College of Computing, Intel’s Probabilistic Computing Center, the MIT-IBM Watson AI Lab, the Aphorism Foundation, and the Siegel Family Foundation.

Machine-learning system flags remedies that might do more harm than good

Sepsis claims the lives of nearly 270,000 people in the U.S. each year. The unpredictable medical condition can progress rapidly, leading to a swift drop in blood pressure, tissue damage, multiple organ failure, and death.

Prompt interventions by medical professionals save lives, but some sepsis treatments can also contribute to a patient’s deterioration, so choosing the optimal therapy can be a difficult task. For instance, in the early hours of severe sepsis, administering too much fluid intravenously can increase a patient’s risk of death.

To help clinicians avoid remedies that may potentially contribute to a patient’s death, researchers at MIT and elsewhere have developed a machine-learning model that could be used to identify treatments that pose a higher risk than other options. Their model can also warn doctors when a septic patient is approaching a medical dead end — the point when the patient will most likely die no matter what treatment is used — so that they can intervene before it is too late.

When applied to a dataset of sepsis patients in a hospital intensive care unit, the researchers’ model indicated that about 12 percent of treatments given to patients who died were detrimental. The study also reveals that about 3 percent of patients who did not survive entered a medical dead end up to 48 hours before they died.

“We see that our model is almost eight hours ahead of a doctor’s recognition of a patient’s deterioration. This is powerful because in these really sensitive situations, every minute counts, and being aware of how the patient is evolving, and the risk of administering certain treatment at any given time, is really important,” says Taylor Killian, a graduate student in the Healthy ML group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Joining Killian on the paper are his advisor, Assistant Professor Marzyeh Ghassemi, head of the Healthy ML group and senior author; lead author Mehdi Fatemi, a senior researcher at Microsoft Research; and Jayakumar Subramanian, a senior research scientist at Adobe India. The research is being presented at this week’s Conference on Neural Information Processing Systems.  

A dearth of data

This research project was spurred by a 2019 paper Fatemi wrote that explored the use of reinforcement learning in situations where it is too dangerous to explore arbitrary actions, which makes it difficult to generate enough data to effectively train algorithms. These situations, where more data cannot be proactively collected, are known as “offline” settings.

In reinforcement learning, the algorithm is trained through trial and error and learns to take actions that maximize its accumulation of reward. But in a health care setting, it is nearly impossible to generate enough data for these models to learn the optimal treatment, since it isn’t ethical to experiment with possible treatment strategies.

So, the researchers flipped reinforcement learning on its head. They used the limited data from a hospital ICU to train a reinforcement learning model to identify treatments to avoid, with the goal of keeping a patient from entering a medical dead end.

Learning what to avoid is a more statistically efficient approach that requires fewer data, Killian explains.

“When we think of dead ends in driving a car, we might think that is the end of the road, but you could probably classify every foot along that road toward the dead end as a dead end. As soon as you turn away from another route, you are in a dead end. So, that is the way we define a medical dead end: Once you’ve gone on a path where whatever decision you make, the patient will progress toward death,” Killian says.

“One core idea here is to decrease the probability of selecting each treatment in proportion to its chance of forcing the patient to enter a medical dead-end — a property that is called treatment security. This is a hard problem to solve as the data do not directly give us such an insight. Our theoretical results allowed us to recast this core idea as a reinforcement learning problem,” Fatemi says.

To develop their approach, called Dead-end Discovery (DeD), they created two copies of a neural network. The first neural network focuses only on negative outcomes — when a patient died — and the second network only focuses on positive outcomes — when a patient survived. Using two neural networks separately enabled the researchers to detect a risky treatment in one and then confirm it using the other.

They fed each neural network patient health statistics and a proposed treatment. The networks output an estimated value of that treatment and also evaluate the probability the patient will enter a medical dead end. The researchers compared those estimates to set thresholds to see if the situation raises any flags.

A yellow flag means that a patient is entering an area of concern while a red flag identifies a situation where it is very likely the patient will not recover.

Treatment matters

The researchers tested their model using a dataset of patients presumed to be septic from the Beth Israel Deaconess Medical Center intensive care unit. This dataset contains about 19,300 admissions with observations drawn from a 72-hour period centered around when the patients first manifest symptoms of sepsis. Their results confirmed that some patients in the dataset encountered medical dead ends.

The researchers also found that 20 to 40 percent of patients who did not survive raised at least one yellow flag prior to their death, and many raised that flag at least 48 hours before they died. The results also showed that, when comparing the trends of patients who survived versus patients who died, once a patient raises their first flag, there is a very sharp deviation in the value of administered treatments. The window of time around the first flag is a critical point when making treatment decisions.

“This helped us confirm that treatment matters and the treatment deviates in terms of how patients survive and how patients do not. We found that upward of 11 percent of suboptimal treatments could have potentially been avoided because there were better alternatives available to doctors at those times. This is a pretty substantial number, when you consider the worldwide volume of patients who have been septic in the hospital at any given time,” Killian says.

Ghassemi is also quick to point out that the model is intended to assist doctors, not replace them.

“Human clinicians are who we want making decisions about care, and advice about what treatment to avoid isn’t going to change that,” she says. “We can recognize risks and add relevant guardrails based on the outcomes of 19,000 patient treatments — that’s equivalent to a single caregiver seeing more than 50 septic patient outcomes every day for an entire year.”

Moving forward, the researchers also want to estimate causal relationships between treatment decisions and the evolution of patient health. They plan to continue enhancing the model so it can create uncertainty estimates around treatment values that would help doctors make more informed decisions. Another way to provide further validation of the model would be to apply it to data from other hospitals, which they hope to do in the future.

This research was supported in part by Microsoft Research, a Canadian Institute for Advanced Research Azrieli Global Scholar Chair, a Canada Research Council Chair, and a Natural Sciences and Engineering Research Council of Canada Discovery Grant.

Popular new major blends technical skills and human-centered applications

Annie Snyder wasn’t sure what she wanted to major in when she arrived on campus. She drifted toward MIT’s most popular major, electrical engineering and computer science (EECS), also known as Course 6, but it didn’t feel like quite the right fit. She was interested in computer science but more passionate about understanding how technology affects people’s everyday lives.

Snyder, now a junior, found a compelling mix of technical skills and human-centered applications in the major 6-14: Computer Science, Economics, and Data Science, which was jointly launched by the computer science and economics departments in 2017.

The major 6-14 is a unique blend of computer science, data science, and economics. Students learn computing fundamentals, like programming and algorithms, and receive a multifaceted view of data science, from machine learning to econometrics. The major also covers economics concepts like game theory, incentives, and multiagent systems.

“The economics side of things fascinated me. It seemed like this interesting way to take these technical concepts that are really abstract, which I was familiar with through my math background, and apply them to people, society, and modeling human behavior,” Snyder says. “At the same time, computing is a tool that is going to permeate every field, so having that computing experience is a way to up your game, in a sense.”

Since its inception, Course 6-14 has attracted students with a diverse set of interests. About 40 students chose the major in 2017 and it has since grown to include 135 students, more than half of whom are women. The first cohort of computer science and economics “bilinguals” graduated last year. Students have followed a wide range of paths, including joining tech giants like Google and Microsoft, starting careers at finance and management consulting companies, working in logistics or data analytics, pursuing academic research, and more.

Economics and computing join forces

Computer science and economics have always had some overlap, but as more market exchanges take place in online systems, the fields have become inseparable. The decision to create the blended major 6-14 grew out of developing collaborations between faculty in both departments, as well as strong student interest in the increasingly intertwined disciplines, says Asu Ozdaglar, head of the Department of Electrical Engineering and Computer Science and deputy dean of academics of the Schwarzman College of Computing, who helped oversee the launch of the new major.

Faculty members wanted to blend the fields in a way that would inspire and empower students, she says. Course 6-14 majors learn a variety of mathematical skills, but they also acquire hands-on experience in empirical analysis of data to uncover and solve real-world problems.

“The combination of topics and skills that 6-14 offers is not just useful for scholars intending to specialize at this exciting interface. The job market for our undergraduates has long valued exactly this combination of skills, as jobs in computer science and data science increasingly value knowledge of economic analysis, while job opportunities in economics, management consulting, and finance now often demand not just mathematical maturity but strong computational, algorithmic, and statistical expertise,” says Ozdaglar.

From the classroom to the real world

Computer science and data science provide tools for problem solving, and economics applies those tools to domains where there is rapidly growing intellectual, scholarly, and commercial interest, says David Autor, the Ford Professor of Economics, who helped launch Course 6-14.

He expects demand for graduates with skills in both disciplines will continue increasing, especially as more economic activity moves online. Companies large and small will need employees who can design platforms, think about incentives, and interpret large amounts of behavioral data.

Autor also hopes that the major 6-14 will raise awareness of economics at MIT and showcase that the field is a formal science with a widely useful toolset.

“Economics teaches people to think about social science problems analytically, in a very compelling and constructive way. Some of those problems are in ecommerce and data analysis, but some of those problems are in economic development, or social insurance programs, or climate science. The value of economics is that it provides a toolkit for applying the same kind of analytical thinking someone would to an engineering or computer science problem to these problems that greatly shape our world,” he says. 

Preparing computing ‘bilinguals’

Senior Ali Sinan Kaya leveraged the skills he’s developed in Course 6-14 to land internships and research opportunities that will give him a leg up in his future career. He recently completed a UROP (Undergraduate Research Opportunities Program) at the MIT Sloan School of Management that involved testing an optimization algorithm for an online retailer.

The company offers services like assembly and insurance to customers who purchase furniture online. Kaya and his collaborators found that the way those services are displayed on the website has a huge impact on purchasing behavior.

“[Course] 6-14 gave me a good foundation that I was able to use when interviewing for these positions, to secure these internships,” he says. “Economics, computer science, data science, and mathematics — at the intersection of these fields, you have a successful data scientist. I don’t consider myself a successful data scientist yet, but I think 6-14 has really given me a foundation to become a successful data scientist.”

Kaya plans to embark on a corporate career to better understand how the economy works. In the long term, he hopes to apply those lessons as a politician or policy expert in his native Turkey.

“I want to use all this knowledge and these experiences to hopefully bring about change within my community,” he says.

For Ozdaglar, it has been especially rewarding to see students like Kaya master both skill sets in an effort to do important work in the world.

“It has been amazing to help develop a program that educates students at this exciting intersection. We never viewed this as just putting curricula together across two departments. Rather, Course 6-14 combines the strengths of the two disciplines to offer unique classes and opportunities for our students. It provides such a strong foundation that students are able to address deep problems that require a mastery of both of these disciplines. It has been wonderful to see this new generation of computing ‘bilinguals’ who will be able to make great contributions,” she says.