Latest Twitter Threads by @dcpage3 on Thread Reader App

Sep 11, 2019 • 25 tweets • 8 min read

The paper that introduced Batch Norm arxiv.org/abs/1502.03167 combines clear intuition with compelling experiments (14x speedup on ImageNet!!)

So why has 'internal covariate shift' remained controversial to this day?

Thread 👇

A recent question on Twitter from @yoavgo shows that there's still confusion about how Batch Norm works in practice:

https://twitter.com/yoavgo/status/1169495585084321792

Jun 20, 2019 • 4 tweets • 3 min read

New blog post: How does batch norm _really_ help optimisation?

We go on a tour of bad inits, degenerate networks and spiky Hessians - all in a Colab notebook:
colab.research.google.com/github/davidcp…

Summary 👇

1/ Early signs of trouble.

We learn that deep ReLU nets with He-init, but no batch norm, basically ignore their inputs! (Check out arxiv.org/abs/1902.04942 by Luther, @SebastianSeung for background.)

Easy to miss if you pool across channels:

Share this page!

Enter URL or ID to Unroll