Model-based planning is often thought to be necessary for deep reasoning & generalization. But the space of choices in model-based deep RL is huge. Which work well and which don't? In our new paper (accepted to #ICLR2021), we investigate! arxiv.org/abs/2011.04021 1/
Spoiler: our findings really challenged some deeply-held assumptions we had about what planning is useful for and how much planning is really needed in popular MBRL benchmarks---even some "strategic" ones like Sokoban. 2/
This is joint work with @theophaneweber, Abe Friesen, @FeryalMP, Arthur Guez, @fabiointheuk, @simswitherspoon, Thomas Anthony, Lars Buesing, and @PetarV_93 . 3/
We study MuZero (deepmind.com/blog/article/m…), a state-of-the-art MBRL algorithm with connections to a number of other model-based methods, including MPC, policy iteration, and Dyna. 4/
In the paper, we perform a large number of variations and ablations across eight environments, including continuous control, action games, and strategic games. 5/
Our results show that (1) planning at test time adds little (in most envs), (2) simple & shallow planning is sufficient for learning (again, in most envs), and (3) that planning may not aid generalization as much as you might expect---even with a perfect model! 6/
From these results, we draw two main conclusions. First, different environments (i.e. beyond Mujoco, Atari & Sokoban) are sorely needed to evaluate advances in MBRL, particularly for "reasoning". 7/
Second, value functions, policies, and transition models _all_ need to generalize well if we want to leverage MBRL for generalization. But therein lies a catch-22: if these all generalize, do we even need to do planning at all? 8/
My view is still "yes", but is more nuanced than before! And I (even more strongly now) believe that we need to find better inductive biases for value functions & policies, as well as world models. 9/
This work was a surprising, but overall satisfying, journey to better understand some characteristics of model-based deep RL. For more details, I encourage you to check out the paper! arxiv.org/abs/2011.04021 10/10

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jess Hamrick

Jess Hamrick Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!