TL;DR: Unsupervised pre-training for efficient RL
arxiv.org/abs/1906.05030
1) unlimited unsupervised interaction
2) few-shot rewarded interactions (standard RL setup)
We apply this regime to all 57 Atari games
But where do the features come from?
Plug these features into SF and you're good to go!
Human-level performance on 14 Atari games.