Doctoral Thesis: Clustering and Visualizing Solution Variation In Programming Classes


Event Speaker: 

Elena Glassman

Event Location: 

32-G449 (Kiva)

Event Date/Time: 

Monday, July 25, 2016 - 10:00am

In a massive open online course (MOOC), a single programming exercise may yield thousands of student solutions that vary in many ways, some superficial and some fundamental. For teachers, this variation can be a source of pedagogically valuable examples and expose corner cases not yet covered by autograding. For students, the variation in a large class means that other students may have struggled along a similar solution path, hit the same bugs, and can offer hints based on that earned expertise.
This thesis describes three systems that explore the value of solution variation in large-scale programming and simulated digital circuit classes. All three systems have been evaluated using data or live deployments in on-campus or edX courses with thousands of students. (1) OverCode visualizes thousands of programming solutions using static and dynamic analysis to cluster similar solutions. It lets teachers quickly develop a high-level view of student understanding and misconceptions and provide feedback that is relevant to many student solutions. (2) Foobaz clusters variables in student programs by their names and behavior so that teachers can give feedback on variable naming. Rather than requiring the teacher to comment on thousands of students individually, Foobaz generates personalized quizzes that help students evaluate their own names by comparing them with good and bad names from other students. (3) ClassOverflow collects and organizes solution hints indexed by the autograder test that failed or a performance characteristic like size or speed. It helps students reflect on their debugging or optimization process, generates hints that can help other students with the same problem, and could potentially bootstrap an intelligent tutor tailored to the problem. 
These systems demonstrate how clustering and visualizing student solutions can help teachers and students provide types of one-on-one design feedback at scale that was previously only possible in a small classroom or one-on-one tutoring. Teachers can directly respond to trends and outliers within the students' own solutions. The feedback generated by both teachers and students can be re-used by future students who attempt the same programming or hardware design problem.
Elena L. Glassman is an EECS Ph.D. candidate at MIT’s Computer Science and Artificial Intelligence Lab, where she specializes in human-computer interaction. Elena uses program analysis, machine learning, and crowdsourcing techniques to create systems that help teach programming and hardware design to thousands of students at once. She has also taught and served in leadership positions for MIT MEET, which teaches gifted Israelis and Palestinians computer science and teamwork in Jerusalem. Elena earned her MIT EECS B.S. and M.Eng. degrees in ’08 and ’10, respectively, and expects to receive her Ph.D. in 2016. She was awarded both NSF and NDSEG fellowships and MIT’s Amar Bose Teaching Fellowship. In addition to doing research at MIT, she has been a visiting researcher at Stanford and a summer research intern at both Google and Microsoft Research.
Thesis Supervisor: Prof. Rob Miller