Doctoral Thesis: Near-optimal Learning in Sequential Games
Abstract: In the presence of multiple learning agents, sequential decision making problems become sequential games. Instead of finding an optimal policy as in single agent setting, now the learning objective is to find a Nash equilibrium. Just like the other machine learning paradigm, reinforcement learning agents are hungry for data, so sample efficiency is always a key concern. In this thesis, we study one of the most fundamental questions in learning in sequential games:
1.(Lower bound) How many samples are necessary to find a Nash equilibrium in sequential game, no matter what learning algorithm is used?
2.(Upper bound) How to design (computationally) efficient learning algorithms with sharp sample complexity guarantees?
When the upper and lower bounds match each other, (minimax) optimal learning is achieved. In this thesis, we present near-optimal learning results for two kinds of sequential games:
1.Markov games (stochastic games), where all the agents can observe the underlying states.
2.Imperfect-information extensive-form games. Where different agents can have different observations given the same state
To achieve near-optimal learning, a series of novel algorithmic idea and analysis tools will be introduced, such as adaptive estimation uncertainty quantification (bonus design), certified policy, balanced exploration and log-partition function (dual) reformulation, which may be of independent interest.
- Date: Tuesday, February 28
- Time: 2:00 pm - 3:30 pm
- Category: Thesis Defense
- Location: 32-D463 (Star)
Additional Location Details:
Thesis Committee: Prof. Suvrit Sra (Advisor)
Prof. Constantinos Daskalakis, Prof. Chi Jin (Princeton)