6.246 Reinforcement Learning: Foundations and Methods


Graduate Level
Units: 4-0-8
Instructor:  Prof. Cathy Wu, cathywu@mit.edu;  Professor Leslie Kaelbling, lpk@csail.mit.edu.
Schedule: Lecture: TR1-2:30, Recitation: TBD, virtual instruction
Enrollment Limited to 60
This subject counts as a Control concentration subject.  Reinforcement learning (RL) as a methodology for approximately solving sequential decision-making under uncertainty, with foundations in optimal control and machine learning. Finite horizon and infinite horizon dynamic programming, focusing on discounted Markov decision processes. Value and policy iteration. Monte Carlo, temporal differences, Q-learning, and stochastic approximation. Approximate dynamic programming, including value-based methods and policy space methods. Special topics at the boundary of theory and practice in RL. Applications and examples drawn from diverse domains. While an analysis prerequisite is not required, mathematical maturity is necessary. Enrollment limited.
Expectations and prerequisites: There is a large class participation component. In terms of prerequisites, students should be comfortable at the level of receiving an A grade in probability (6.041 or equivalent), machine learning (6.867 or equivalent), convex optimization (from 6.255 / 6.036 / 6.867 or equivalent), linear algebra (18.06 or equivalent), and programming (Python). Mathematical maturity is required. This is not a Deep RL course. This class is most suitable for PhD students who have already been exposed to the basics of reinforcement learning and deep learning (as in 6.036 / 6.867 / 1.041 / 1.200), and are conducting or have conducted research in these topics.
Course format and scope:
This course will be half theoretical foundations of RL, and half spending time exploring the boundary between theory and practice.

  • For the first half, we will have lectures on what has been established in RL, and will largely follow the texts "Dynamic Programming and Optimal Control" (by Dimitri Bertsekas) and "Neuro Dynamic Programming" (by Dimitri Bertsekas and John Tsitsiklis). As compared with 6.231, this course will increase its emphasis on approximate dynamic programming and reduce its emphasis on classical dynamic programming. The level of rigor expected in HW will be comparable, but the pace in lecture will be faster and the topics will be different. HW and exam will be similar in style to 6.231 (See: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-231-dynamic-programming-and-stochastic-control-fall-2015/assignments/).


  • For the second half students should be prepared to synthesize theoretical and/or empirical papers and materials into informative lectures (and recitations, as needed), which explore the boundary of theory and practice in reinforcement learning and other special topics. Specific topics may include exploration, off-policy / transfer learning, combinatorial optimization, abstraction / hierarchy, control theory, and game theory / multi-agent RL. The level of effort expected is comparable (or more) than that of a traditional final research project for a research-oriented class. These lectures will stand in place of a traditional class project for students selected for this role. Through this exploration, we seek to characterize together the gap between theory and practice in RL.

This experimental course is meant to be an advanced graduate course, to explore possible alternative ways and perspectives on studying reinforcement learning.
More information on how this subject will be taught can be found at: https://eecs.scripts.mit.edu/eduportal/__How_Courses_Will_Be_Taught_Online_or_Oncampus__/S/2021/#6.246