Latest Twitter Threads by @Luke_Metz on Thread Reader App

Nov 18, 2022 • 14 tweets • 6 min read

Tired of having to manually tune optimizers? We’re excited to release VeLO, the first hparam-free, super versatile learned optimizer that outperforms hand-designed optimizers on real world problems. It was trained on thousands of TPU months of compute. 1/N arxiv.org/abs/2211.09760 VeLO is a learned optimizer. Instead of designing an update rule by hand as commonly done (e.g. Adam, SGD), VeLO is a tiny neural network that takes in gradient values, and outputs weight updates.

Nov 11, 2021 • 7 tweets • 4 min read

New paper: when to use gradients

arxiv.org/abs/2111.05803

DL researchers often compute derivatives though just about everything (physics simulators, optimization procedures, renderers). Sometimes these gradients are useful, other times they are not.

We explore why.

1/7 We show that when computing a gradient through an iterative system, we need to compute terms which consist of a product of the state transition Jacobian. This product is what causes issues.

If the Jacobian's eigenvalues are > 1, gradients explode. < 1, gradients vanish 😱

2/7

Sep 24, 2020 • 8 tweets • 3 min read

We have a new paper on learned optimizers! We used thousands of tasks (and a lot of compute 😬) to train general purpose learned optimizers that perform well on never-before-seen tasks, and can even train new versions of themselves.
arxiv.org/abs/2009.11243
1/8 In the same way learned features took over computer vision, we believe ML algorithms will be replaced with learned components.

We shift away from hand designed optimizers (SGD, Adam) to learned optimizers parameterized by neural nets and trained to optimize neural nets.
2/8

Mar 13, 2020 • 4 tweets • 3 min read

Excited to share our new work! We introduce a dataset of tasks for learned optimizer research. As an example application of this dataset we meta-train lists of optimizer hyper parameters that work well on a diverse set of tasks. arxiv.org/abs/2002.11887 1/4

We are releasing these lists of optimizer hyperparameters in TF, PyTorch, and Jax as a drop in replacement for existing optimizes. Give it a try and let us know how it goes! github.com/google-researc… 2/4

Share this page!

Enter URL or ID to Unroll