Sebastian Flennerhag Profile picture
Research scientist at @deepmindai. All opinions my own.

Sep 13, 2021, 6 tweets

What should a meta-learner optimize? What if we make it chase its own future outputs?

Turns out, it can improve meta-optimization, set new SOTAs, and lead to new types of meta-learning. arxiv.org/pdf/2109.04504…

w. Y. Schroecker, @tomzhavy, @hado, D. Silver, S. Singh.
🧵👇

The meta-learner's goal is to match a target. That gives us a lot of flexibility: the matching function can ease optimization and we can get open-ended meta-learning by generating targets from the meta-learner itself: the better the meta-learner gets, the better the targets 🚀

Of course, not all targets are good. But if the target is in a direction of performance improvement, target matching improves performance. In fact, by picking good targets, we can get larger performance improvements than if we directly meta-optimize the objective! 💥

By bootstrapping, we can have targets that are further along the learning curve without having to backpropagate through the updates. On Atari, we extend the effective meta-learning horizon by 5x, pay a 25% increase in wall-clock speed, and get 2x performance.

The flexibility of target matching opens up for new forms of meta-learning, such as meta-learning parameters that aren’t part of the learner’s update. We tried it out for size by meta-learning epsilon in a Q-learning agent to great effect.

Yet, we have only scratched the surface! There are many questions left unanswered, both in terms of picking targets, matching functions, and what new forms of meta-learning this affords. 🤔

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling