Thread by @Pavel_Izmailov on Thread Reader App

We run HMC on hundreds of TPU devices for millions of training epochs to provide our best approximation of the true Bayesian neural networks! (1) BNNs do better than deep ensembles (2) no cold posteriors effect but (3) BNNs are terrible under data corruption, and much more! 🧵

First, we find that BNNs at temperature 1 with regular Gaussian priors are actually quite good, outperforming deep ensembles on both accuracy and likelihood!

In fact, tempering even hurts the performance in some cases, with the best performance achieved at temperature 1. What is the main difference with arxiv.org/abs/2002.02405? (1) We turn data augmentation off and (2) we use a very high fidelity inference procedure.

What about the priors? We compare several prior families and study the dependence on prior variance with Gaussian priors. Generally, the effect on performance is fairly minor.

We also compare the predictions of popular approximate inference methods to HMC. Advanced SGMCMC methods provide the most accurate approximation, deep ensembles are quite good even though often considered non-Bayesian, and mean field VI is the worst.

There is also a negative result: Bayesian neural nets seem to generalize very poorly to corrupted data! An ensemble of 720 HMC samples is worse than a single SGD solution when the inputs are noisy or corrupted.

Another cool result: a single long HMC chain appears to be quite good at exploring the posterior, at least in the function space. The results hint that MCMC methods are able to leverage mode connectivity to move between functionally diverse solutions.

We are going to release our JAX code and the HMC samples very soon. Stay tuned!

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll