, 7 tweets, 2 min read Read on Twitter
Interesting paper looking at linearized dynamics of SGD, and the infinite depth limit of NTK:

arxiv.org/abs/1905.13654

hattip to @aib1515 for sharing. 1/7
Previous work on information propagation through the hidden layers showed that the conjugate kernel converges to a constant one exponentially fast, unless the parameters are initialized at the edge of chaos (EOC). weight/bias variances leading to EOC depend on architecture. 2/7
I has been observed that initializing on EOC leads to better training (not getting stuck). Thoroughly explored here: arxiv.org/abs/1902.06853. 3/7
New paper shows that the NTK outside of the edge of chaos also converges exponentially to a degnerate, constant one as depth grows. A constant NTK (all 1s) means kernel grad descent only changes vector of output by something prop to (1,...,1). Can't fit data like this. 4/7
On EOC, kernel converges subexp. For some activations, including Tanh, the convergence is slower than for ReLU ones. The limit kernel is actually invertible, but they claim it "carries almost no info on the inputs" (I don't get why..), so that converging slower is better. 5/7
They indeed find that tanh works better than ReLU on MNIST/CIFAR10. Am I the only one to find this surprising? 6/7
They further show that SGD leads to kernel gradient descent of the function, but with an extra noise term, which could be interpreted as a regularization of SGD?
By assuming NTK converges, we get linearized SGD dynamics, which can be analyzed. 7/7
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Guillermo Valle
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!