Thread by @Luke_Metz on Thread Reader App

We have a new paper on learned optimizers! We used thousands of tasks (and a lot of compute 😬) to train general purpose learned optimizers that perform well on never-before-seen tasks, and can even train new versions of themselves.
arxiv.org/abs/2009.11243
1/8

In the same way learned features took over computer vision, we believe ML algorithms will be replaced with learned components.

We shift away from hand designed optimizers (SGD, Adam) to learned optimizers parameterized by neural nets and trained to optimize neural nets.
2/8

We explore a new learned optimizer architecture: a hierarchical LSTM. It has access to both training loss and validation loss of the target task, which allows for dynamic regularization.
3/8

We find the number of tasks we train the learned optimizer on to be critical. More tasks leads to better optimizers and we ultimately train on a dataset of ~6k tasks.
4/8

The resulting learned optimizer, which requires no hyper parameter tuning, outperforms modestly tuned hand design methods on the majority of our tasks.
5/8

On larger scale tasks, these optimizers have comparable performance to learning rate tuned adam/momentum despite never seeing similar tasks at outer-training time. For example, below is a small ResNet on CIFAR-10.
6/8

In my favorite experiment, we show how general these methods are by using them to train new versions of themselves!

(This is similar to self-hosting compiles -- compilers which are written in the language that they compile.)
7/8

Thanks to my wonderful collaborators: @niru_m , @bucketofkets , @poolio, @jaschasd 🙏
8/8

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll