Latest Twitter Threads by @BorisAKnyazev on Thread Reader App

Oct 26, 2021 • 12 tweets • 5 min read

Do we still need SGD/Adam to train neural networks? Based on our #NeurIPS2021 paper, we are one step closer to replacing hand-designed optimizers with a single meta-model. Our meta-model can predict parameters for almost any neural network in just one forward pass. (1/n)

For example, our meta-model can predict all ~25M parameters of a ResNet-50 and this ResNet-50 will achieve ~60% on CIFAR-10 without any training. When our meta-model was training, it did not observe any network close to ResNet-50. (2/n)

Share this page!

Enter URL or ID to Unroll