On the Theory of Gradient-Based Optimization and Sampling: A View from Continuous Time
Michael I. Jordan
University of California, Berkeley
Abstract: Gradient-based optimization has provided the theoretical and practical foundations on which recent developments in statistical machine learning have reposed. A complementary set of foundations is provided by Monte Carlo sampling, where gradient-based methods have also been leading the way in recent years. We explore links between gradient-based optimization algorithms and gradient-based sampling algorithms. Although these algorithms are generally studied in discrete time, we find that fundamental insights can be obtained more readily if we work in continuous time. Results that I will cover include: (1) there is a counterpart of Nesterov acceleration in the world of Langevin diffusion; (2) Langevin algorithms can converge quickly enough to give logarithmic regret in Bayesian multi-arm bandits; and (3) symplectic integration conserves rates of convergence from continuous time to discrete time.