While the past decade has witnessed tremendous progress in applications of reinforcement learning (RL), theoretical understanding has lagged behind. For instance, many RL algorithms used in practice behave far better than extant theory would suggest; at the same time, we rarely know whether the procedures used in practice are the best possible, or if instead there exist significantly better ones.
This talk overviews a few lines of work sharing the common motivation of closing such gaps. We describe a notion of instance-optimality for stochastic algorithms, along with modified forms of Q-learning that achieve these fundamental limits on an instance-wise basis. We also address function approximation in RL, and derive oracle inequalities that guide the practitioner in making optimal trade-offs between approximation and estimation errors. We conclude by discussing some ongoing work and interesting challenges ahead.
Based on collaborations with: Emma Brunskill, Yaqi Duan, Michael Jordan, Koulik Khamaru, Wenlong Mou, Ashwin Pananjady, Feng Ruan, Eric Xia, Mengdi Wang and Andrea Zanette.
Martin Wainwright is currently Chancellor's Professor at the University of California at Berkeley, with a joint appointment between the Department of Statistics and the Department of EECS. He received a Bachelor's degree in Mathematics from University of Waterloo, Canada, and Ph.D. degree in EECS from Massachusetts Institute of Technology (MIT). His research interests include high-dimensional statistics, statistical machine learning, information theory, and optimization theory. Among other awards, he has received the COPSS Presidents' Award (2014) from the Joint Statistical Societies; the David Blackwell Lectureship (2017) and Medallion Lectureship (2013) from the Institute of Mathematical Statistics; and Best Paper awards from the IEEE Signal Processing Society and IEEE Information Theory Society. He was a Section Lecturer at the International Congress of Mathematicians in 2014.