Read on Twitter

12,399 views

Steven Hansen

@Zergylord

, 5 tweets, 3 min read Read on Twitter

@wwdabney

@wwdabney

Excited to share some new work on ArXiv today: "Fast Task Inference with Variational Intrinsic Successor Features" Done with @wwdabney, Andre Barreto, Tom Van de Wiele, @dwf, and @VladMnih

TL;DR: Unsupervised pre-training for efficient RL
arxiv.org/abs/1906.05030

Imagine that unsupervised interaction is free/cheap, but evaluating the reward function isn't. We formalize this as a 2 phase training regime:

1) unlimited unsupervised interaction

2) few-shot rewarded interactions (standard RL setup)

We apply this regime to all 57 Atari games

The successor features (SF) framework (arxiv.org/abs/1606.05312) decouples state and reward dynamics. This allows you to infer the solution to a new task by solving a linear regression problem mapping features to rewards.

But where do the features come from?

Learning options with predictable behavior (ala VIC arxiv.org/abs/1611.07507 and DIAYN arxiv.org/abs/1802.06070) is an unsupervised objective that can be seen as implicitly learning controllable features.

Plug these features into SF and you're good to go!

After unsupervised learning of controllable successor features, we can now learn about a task very efficiently by solving a 5 parameter(!) linear regression problem instead of a non-linear RL problem.

Human-level performance on 14 Atari games.

Like this thread? Get email updates or save it to PDF!

Subscribe to Steven Hansen

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Steven Hansen

This content may be removed anytime!

Try unrolling a thread yourself!

Related threads

Trending hashtags

Did Thread Reader help you today?