Balaji Lakshminarayanan's Threads

Dec 7, 2019 • 6 tweets • 5 min read

Why do deep ensembles trained with just random initialization work surprisingly well in practice?

In our recent paper arxiv.org/abs/1912.02757 with @stanislavfort & Huiyi Hu, we investigate this by using insights from recent work on loss landscape of neural nets.

More below: @stanislavfort 2) One hypothesis is that ensembles may lead to different modes while scalable Bayesian methods may sample from a single mode.

We measure the similarity of function (both in weight space and function space) to test this hypothesis.

Share this page!

Enter URL or ID to Unroll