6.246 Reinforcement Learning: Foundations and Methods


Graduate Level
Units: 4-0-8
Instructor:  Prof. Cathy Wu, cathywu@mit.edu
Schedule: Lecture: MW9.30-11 (32-124) Recitation: R4-5 (32-124) or F4-5 (4-231)
Enrollment Limited to 57

This subject counts as a Control concentration subject.  Reinforcement learning (RL) as a methodology for approximately solving sequential decision-making under uncertainty, with foundations in optimal control and machine learning. Finite horizon and infinite horizon dynamic programming, focusing on discounted Markov decision processes. Value and policy iteration. Monte Carlo, temporal differences, Q-learning, and stochastic approximation. Approximate dynamic programming, including value-based methods and policy space methods. Special topics at the boundary of theory and practice in RL. Applications and examples drawn from diverse domains. While an analysis prerequisite is not required, mathematical maturity is necessary. Enrollment limited
Expectations and prerequisites: There is a large class participation component. In terms of prerequisites, students should be comfortable at the level of receiving an A grade in probability (6.041 or equivalent), machine learning (6.867 or equivalent), convex optimization (from 6.255 / 6.036 / 6.867 or equivalent), linear algebra (18.06 or equivalent), and programming (Python). Mathematical maturity is required. This is NOT a deep RL course. This class is most suitable for PhD students who have already been exposed to the basics of reinforcement learning and deep learning (as in 6.036 / 6.867 / 1.041), and are conducting or have conducted research in these topics.
Course format and scope:
This course will be half theoretical foundations of RL, and half spending time exploring the boundary between theory and practice.

  • For half of the course, we will have traditional lectures on what has been established in RL, and will largely follow the texts "Dynamic Programming and Optimal Control" (by Dimitri Bertsekas) and "Neuro Dynamic Programming" (by Dimitri Bertsekas and John Tsitsiklis). As compared with 6.231, this course will increase its emphasis on approximate dynamic programming and significantly reduce its emphasis on dynamic programming. The level of rigor expected in HW will be comparable, but the pace in lecture will be faster and the topics will be different. HW and exam will be similar in style to 6.231 (See: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-231-dynamic-programming-and-stochastic-control-fall-2015/assignments/).


  • For another half of the course, students should be prepared to read and synthesize theoretical and/or empirical research papers and materials into informative lectures (and recitations, as needed), which explore the boundary of theory and practice in reinforcement learning and other special topics. Specific topics will be selected through a combination of staff and student interest, and may include exploration, off-policy / transfer learning, factored MDPs / end-to-end representation learning, combinatorial optimization, abstraction / hierarchy, and/or game theory / multi-agent RL. The level of effort expected is comparable (or more) than that of a traditional final research project for a research-oriented class. These lectures will stand in place of a traditional class project for students selected for this role. Through this exploration, we seek to characterize together the gap between theory and practice for RL.

This new course is meant to be an advanced and experimental graduate course, to explore possible alternative ways and perspectives on studying reinforcement learning.