David Page Profile picture
Machine learning researcher @nanopore
A. S. Profile picture Mike Vella Profile picture 2 subscribed
Sep 11, 2019 25 tweets 8 min read
The paper that introduced Batch Norm arxiv.org/abs/1502.03167 combines clear intuition with compelling experiments (14x speedup on ImageNet!!)

So why has 'internal covariate shift' remained controversial to this day?

Thread 👇 A recent question on Twitter from @yoavgo shows that there's still confusion about how Batch Norm works in practice:

Jun 20, 2019 4 tweets 3 min read
New blog post: How does batch norm _really_ help optimisation?

We go on a tour of bad inits, degenerate networks and spiky Hessians - all in a Colab notebook:
colab.research.google.com/github/davidcp…

Summary 👇 1/ Early signs of trouble.

We learn that deep ReLU nets with He-init, but no batch norm, basically ignore their inputs! (Check out arxiv.org/abs/1902.04942 by Luther, @SebastianSeung for background.)

Easy to miss if you pool across channels: