We go on a tour of bad inits, degenerate networks and spiky Hessians - all in a Colab notebook:
colab.research.google.com/github/davidcp…
Summary 👇
data:image/s3,"s3://crabby-images/1f2ef/1f2efe77c99099da3d115440f9e09c9b47a68315" alt=""
We learn that deep ReLU nets with He-init, but no batch norm, basically ignore their inputs! (Check out arxiv.org/abs/1902.04942 by Luther, @SebastianSeung for background.)
Easy to miss if you pool across channels:
data:image/s3,"s3://crabby-images/c17c5/c17c53f99e04eac821fe439e41f002d19225b202" alt=""
Finally we connect to instability of SGD and outlying eigenvalues of the Hessian (found by @leventsagun, Bottou, @ylecun.)
The mystery of the spiky Hessian is resolved along with the secrets of batch norm!
data:image/s3,"s3://crabby-images/50696/50696363f383dc2a3c1975bfe9b12cdfcc3fc9d3" alt=""