Part of the Special ECE Seminar Series
Modern Artificial Intelligence
Towards a Theory of Generalization in Reinforcement Learning
Sham Kakade, Department of Computer Science & Department of Statistics at University of Washington
A fundamental question in the theory of reinforcement learning is what properties govern our ability to generalize and avoid the curse of dimensionality. With regards to supervised learning, these questions are well understood theoretically, and, practically speaking, we have overwhelming evidence on the value of representational learning (say through modern deep networks) as a means for sample efficient learning. Providing an analogous theory for reinforcement learning is far more challenging, where even characterizing the representational conditions which support sample efficient generalization is far less well understood.
This work will survey a number of recent advances towards characterizing when generalization is possible in reinforcement learning. We will start by reviewing this question in a simpler context, namely contextual bandits. Then we will move to lower bounds and consider one of the most fundamental questions in the theory of reinforcement learning, namely that of linear function approximation: suppose the optimal Q-function lies in the linear span of a given d dimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? Finally, we will cover a new set of structural and representational conditions which permit generalization in reinforcement learning in a wide variety of settings through the use of function approximation.
Sham Kakade is a professor in the Department of Computer Science and the Department of Statistics at the University of Washington and is also a senior principal researcher at Microsoft Research. He works on the mathematical foundations of machine learning and AI. Sham's thesis helped lay the statistical foundations of reinforcement learning. With his collaborators, his additional contributions include: one of the first provably efficient policy search methods in reinforcement learning; developing the mathematical foundations for the widely used linear bandit models and the Gaussian process bandit models; the tensor and spectral methodologies for provable estimation of latent variable models; the first sharp analysis of the perturbed gradient descent algorithm, along with the design and analysis of numerous other convex and non-convex algorithms. He is the recipient of the ICML Test of Time Award, the IBM Pat Goldberg best paper award, and INFORMS Revenue Management and Pricing Prize. He has been program chair for COLT 2011.
Sham was an undergraduate at Caltech, where he studied physics and worked under the guidance of John Preskill in quantum computing. He completed his Ph.D. with Peter Dayan in computational neuroscience at the Gatsby Computational Neuroscience Unit. He was a postdoc with Michael Kearns at the University of Pennsylvania.