Luke Metz Profile picture
Senior Research Scientist at Google Brain. Formerly @indicoData, @onshape, @OlinCollege My opinions do not represent those of my employer.
Nov 18, 2022 14 tweets 6 min read
Tired of having to manually tune optimizers? We’re excited to release VeLO, the first hparam-free, super versatile learned optimizer that outperforms hand-designed optimizers on real world problems. It was trained on thousands of TPU months of compute. 1/N arxiv.org/abs/2211.09760 VeLO is a learned optimizer. Instead of designing an update rule by hand as commonly done (e.g. Adam, SGD), VeLO is a tiny neural network that takes in gradient values, and outputs weight updates. Image
Nov 11, 2021 7 tweets 4 min read
New paper: when to use gradients

arxiv.org/abs/2111.05803

DL researchers often compute derivatives though just about everything (physics simulators, optimization procedures, renderers). Sometimes these gradients are useful, other times they are not.

We explore why.

1/7
We show that when computing a gradient through an iterative system, we need to compute terms which consist of a product of the state transition Jacobian. This product is what causes issues.

If the Jacobian's eigenvalues are > 1, gradients explode. < 1, gradients vanish 😱

2/7 Image
Sep 24, 2020 8 tweets 3 min read
We have a new paper on learned optimizers! We used thousands of tasks (and a lot of compute 😬) to train general purpose learned optimizers that perform well on never-before-seen tasks, and can even train new versions of themselves.
arxiv.org/abs/2009.11243
1/8
In the same way learned features took over computer vision, we believe ML algorithms will be replaced with learned components.

We shift away from hand designed optimizers (SGD, Adam) to learned optimizers parameterized by neural nets and trained to optimize neural nets.
2/8
Mar 13, 2020 4 tweets 3 min read
Excited to share our new work! We introduce a dataset of tasks for learned optimizer research. As an example application of this dataset we meta-train lists of optimizer hyper parameters that work well on a diverse set of tasks. arxiv.org/abs/2002.11887 1/4 ImageImageImage We are releasing these lists of optimizer hyperparameters in TF, PyTorch, and Jax as a drop in replacement for existing optimizes. Give it a try and let us know how it goes! github.com/google-researc… 2/4 Image