Steven Hansen Profile picture
Senior Research Scientist at DeepMind. Intrinsically motivated to research intrinsic motivation. All opinions my own.
Jun 13, 2019 5 tweets 3 min read
Excited to share some new work on ArXiv today: "Fast Task Inference with Variational Intrinsic Successor Features" Done with @wwdabney, Andre Barreto, Tom Van de Wiele, @dwf, and @VladMnih

TL;DR: Unsupervised pre-training for efficient RL
arxiv.org/abs/1906.05030 Imagine that unsupervised interaction is free/cheap, but evaluating the reward function isn't. We formalize this as a 2 phase training regime:

1) unlimited unsupervised interaction

2) few-shot rewarded interactions (standard RL setup)

We apply this regime to all 57 Atari games