, 14 tweets, 6 min read
My Authors
Read all threads
"Finite Versus Infinite Neural Networks: an Empirical Study." arxiv.org/abs/2007.15801 This paper contains everything you ever wanted to know about infinite width networks, but didn't have the computational capacity to ask! Like really a lot of content. Let's dive in.
Infinite width Neural Network Gaussian Process (NNGP) and Neural Tangent Kernel (NTK) predictions can outperform finite networks, depending on architecture and training practices. For fully connected networks the infinite width limit reliably outperforms the finite network. Image
The NNGP (corresponding to infinite width Bayesian networks) typically outperforms the NTK (corresponding to infinite width networks trained by gradient descent). Image
Centering and ensembling of finite width networks both lead to more kernel-like performance. Centering leads to faster training. ImageImage
Large learning rates and L2 regularization both drive differences between finite networks and kernels, and lead finite width networks to perform better. The combined effect of large learning rates and L2 regularization is superlinear. (repeat figure image for this one) Image
L2 regularization unexpectedly works better for NTK-parameterized networks than for standard-parameterized networks. We emulate this with matching layerwise L2 regularization coefficients, leading to better generalization in standard-parameterized networks (ie, typical networks). Image
The generalization performance of certain finite width networks (especially CNNs without pooling) is non-monotonic in width, in a way not explained by double descent phenomena. (!?!) Image
Diagonal regularization of kernel methods behaves like early stopping of training. There is a simple mapping between regularization strength and early stopping time (previously proposed in arxiv.org/abs/1810.10082). Image
Floating point precision determines the critical dataset size at which kernel methods will fail for numerical reasons. For CNNs with pooling, it's around 10^4 samples for float32. Image
Regularized ZCA whitening of input images improves model accuracy by a surprising amount, especially for infinite width NNGP and NTK predictions. Image
Equivariance is a commonly used to motivate the strong performance of CNNs. We show the property of equivariance is only (and can only be) beneficial for narrow networks far from the kernel regime. Image
Finally, we present a simple method for ensembling the predictions of NNGP and NTK models, making it practical to use data augmentation with infinite width networks. (data augmentation is otherwise impractical, due to the cubic dependence of kernel methods on dataset size) Image
All of these experiments were made possible by the Neural Tangents software library. You should use it for all your infinite width network needs! github.com/google/neural-…
This paper took an amazing amount of work by amazing collaborators. @hoonkp did the most, and @sschoenholz, Jeffrey Pennington, Ben Adlam, @Locchiu, and @ARomanNovak also all put in more than a usual paper's worth of effort.
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Jascha

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!