Stacey Resnikoff | MIT EECS
Reimagined course gives students a deep-dive into the rapidly evolving field of natural language processing.
A newly revamped course, 6.806/6.864 (Advanced Natural Language Processing), is giving EECS students a deep-dive into the rapidly evolving field of natural language processing (NLP), which focuses on enabling computers to learn and understand human languages. The course, co-taught by Regina Barzilay and Tommi Jaakkola, both professors of electrical engineering and computer science, was updated last semester to include deep learning methods and a greater emphasis on student research.
“There is a paradigm shift in the field,” says Barzilay, who has spearheaded 6.806/6.864 since 2004. “For the first time, we are incorporating deep learning methods in this course. Deep neural networks can actually learn representations and make semantic and linguistic connections in a very rich way.” “These are flexible machine learning tools with nearly limitless architectural variations,” says Jaakkola, who lectures on deep learning. “NLP offers an exciting mixture of algorithmic questions, complex combinatorial structures, representational questions, and high-dimensional statistical estimation.”
The course received a makeover in the Fall 2015 semester. Its new format puts increasing emphasis on student-driven research, with two-thirds of class time and 50 percent of the grade focused on student projects, supported by close faculty interaction. Students have flocked to the revamped course; last semester enrollment rose above 100 students, requiring Kirsch Auditorium instead of the 32-144 classroom.
Memory and Understanding
Surya Bhupatiraju, a junior in 6-3 and 18, recently decided Artificial Intelligence was his “calling” when he took 6.806/6.864. “It was a rather large milestone in my undergraduate career. Until this course, the extent of my NLP knowledge was understanding bag-of-word models and recurrent neural networks.”
Bhupatiraju and fellow 6-3 junior Simanta Gautam developed their project “Non-Markovian Control Policies for Text-based Games using External Memory and Deep Reinforcement Learning,” building on the 2015 work of TA Karthik Narasimhan. Like Narasimhan, they were interested in applying deep reinforcement learning framework to text-based computer games. Their goal: to teach a machine to play games not only in the present tense, but informed by past experience.
“We wanted to hook up memory networks to the model (and have it) extract certain memories that would seem most relevant or helpful (to) act,” says Bhupatiraju.
For instance, the “player” might be told to go to a virtual room and repeat a previous task. Using a deep neural network, the machine could look back. “Though game performance in explicitly non-Markovian (memory-based) policies was only barely better than the baseline model. It showed that there’s a lot of potential to implement the full memory network module and use it to pick good memories,” says Bhupatiraju.
Bhupatiraju says he and Gautam want to improve their model and “hopefully publish,” as well as work on original NLP problems.
One ambitious project came at the suggestion of MGH physician Charlotta Lindvall, MD, PhD, who hoped NLP could predict the viability of invasive Cardiac Resynchronization Therapy (CRT) for heart failure patients.
Three undergraduates interested in medical data — 6-3 seniors Austin Freel, Josh Haimson, and 6-3 junior Michael Traub — worked with Lindvall on their project “Predicting the Effectiveness of Cardiac Resynchronization Therapy Using Natural Language Processing” to harness insights behind vast quantities of data.
“I was very impressed by how quickly they grasped the extent of (the clinical) complexity and was moved by their motivation,” says Lindvall.
“Most information about a patient is stored in narrative text clinical notes, which are not conducive to regression or proportional hazards models,” explains Haimson. “Our n-gram bag-of-words models generated feature vectors (that) we could feed into machine learning classification models like Random Forests or Adaboost. One of the more sophisticated approaches we used, ‘Paragraph Vectors,’ is trained on a large unlabeled corpus of clinical text. Arbitrary length sequences of words (are mapped) to a fixed dimensional space, representing the semantic meaning of that text.”
Their approach found latent clinical variables for the potential success of CRT, including symptoms (such as back pain, sleep apnea), primary diagnosis (ischemic/non-ischemic and ejection fraction readings prior to procedure), medication (beta blockers or nitroglycerin), and various social/family history features (marriage status or the father’s morbidity status). The work is being prepared for publication in a clinical journal.
“I think we all hoped our work would have clinical significance, but once we started getting positive results we got very excited,” says Haimson. He hopes the team’s efforts will inform future CRT clinical support tools.
Thanks to the course, Nicholas Locascio, a 6-3 senior, is now “extremely comfortable reading and evaluating the current NLP literature,” as well as building on it.
He and his partner senior Eduardo DeLeon read the 2013 paper “Using Semantic Unification to Generate Regular Expressions from Natural Language” by Research Assistant Nate Kushman with Barzilay. Locascio and DeLeon and applied deep learning to machine-generate regular expressions without complex features engineering.
“Our model is a deep recurrent neural network that generates regular expressions character-by-character. It’s quite ‘data-hungry,’” says Locascio. “We were able to get in touch with Nate directly to ask about his code. He and Prof. Barzilay were cheering us on to try and surpass their original system’s accuracy and performance.”
While their model doesn’t beat the Kushman/Barzilay state-of-the-art, Locascio, explains that “deep learning systems typically use hundreds of thousands (more) training examples” than they had, so the potential is there.
“I knew I wanted to do my MEng in Machine Learning and Artificial Intelligence,” Locacsio says, “But I hadn’t decided on working specifically in NLP until I took this class.”
“Many students did projects using deep neural learning networks, and the topics were not trivial,” says Barzilay, who along with Jaakkola, applauds the students’ enthusiasm and ambition.
In fact, many students still come to group meetings and will continue their NLP research through UROP and UAP. Others have already submitted their papers for publication.
“For undergraduate students who took a three-month class to submit to our main conferences: this is remarkable,” says Barzilay.
Students even volunteered in collaborative note-taking, earning a grade bonus as “scribes,” capturing the immediacy of the outside-textbook content. “The lectures and notes were incredibly well-organized considering the newness of concepts and techniques,” says Haimson.
“Take the class,” Barzilay tells all interested students, especially those who, like her, enjoy the thrill of the chase. “You don’t know if you’ll uncover the hidden puzzles in the text or not. But a model can unlock it. This is the exciting thing.”