In this session, we will have an invited talk from Jascha Sohl-Dickstein at Google Brain), as well as two contributed talks from Matthew Hoffman (Roundoff Error in Metropolis-Hastings Accept-Reject Steps) and Alex Alemi (VIB is Half Bayes). See http://approximateinference.org/schedule/ for details.
Jascha Sohl-Dickstein: Infinite Width Bayesian Neural Networks
Abstract: As neural networks become wider their accuracy improves, and their behavior becomes easier to analyze theoretically. I will give an introduction to a rapidly growing body of work which examines the distribution over functions induced by infinitely wide, randomly initialized, neural networks. Core results that I will discuss include: that the distribution over functions computed by a wide neural network often corresponds to a Gaussian process with a particular compositional kernel; that the predictions of wide neural networks are linear in their parameters throughout training; that this perspective enables analytic predictions for how trainability of finite width networks depends on hyperparameters and architecture; and finally that results on infinite width networks can enable efficient posterior sampling from finite width Bayesian networks. These results provide for surprising capabilities -- for instance, the evaluation of test set predictions which would come from an infinitely wide Bayesian or gradient-descent-trained trained neural network without ever instantiating a neural network, or the rapid training of 10,000+ layer convolutional networks. I will argue that this growing understanding of neural networks in the limit of infinite width is foundational for future theoretical and practical understanding of deep learning.