Bandits with Switching Costs: T^{2/3} Regret


Event Speaker: 

Yuval Peres (Microsoft Research)

Event Location: 


Event Date/Time: 

Tuesday, November 5, 2013 - 4:00pm

Reception to follow.
Consider the adversarial two-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's  T-round minimax regret in this setting (i.e., his excess loss compared to the better of the two actions)  is T^{2/3} (up to a log term).   In the corresponding full-information problem, the minimax regret is known to grow at a slower rate of  T^{1/2} . The difference between these two rates  indicates  that learning with bandit feedback (i.e. just knowing the loss from the player's action, not the alternative)  can be significantly harder than learning with full-information feedback. It also shows that without switching costs, any regret-minimizing algorithm for the bandit problem must sometimes switch actions very frequently. The proof is based on an information-theoretic analysis of a loss process arising from a branching Gaussian random walk.  (Joint work with Ofer Dekel, Jian Ding and  Tomer Koren, available at )
Yuval Peres is a Principal Researcher in the Theory group at Microsoft Research, Redmond (MSR). Before joining MSR in 2006, he was a Professor in the Statistics and Mathematics Departments at UC Berkeley. He has also taught at Yale and at the Hebrew University. Yuval has published more than 200 papers with 100 co-authors and has mentored 20 PhD theses. His research encompasses many areas of probability theory, including random walks, Brownian motion, percolation, point processes and random graphs, as well as connections with Ergodic Theory, PDE, Combinatorics, Fractals and Theoretical Computer Science. He has recently co-authored books on Markov chains and mixing times, on zeros of Gaussian analytic functions, and on Brownian motion. Yuval is a fellow of the American Math Society and of the  Institute of Mathematical Statistics. He received the Rollo Davidson Prize in 1995, the Loeve prize In 2001 and was a co-recipient of the David Robbins prize in 2011. Yuval was an invited speaker at the International Congress of Mathematics (2002) and in the European Congress of Mathematics (2008).