Model-based planning is often thought to be necessary for deep reasoning & generalization. But the space of choices in model-based deep RL is huge. Which work well and which don't? In our new paper (accepted to #ICLR2021), we investigate! arxiv.org/abs/2011.04021 1/
Spoiler: our findings really challenged some deeply-held assumptions we had about what planning is useful for and how much planning is really needed in popular MBRL benchmarks---even some "strategic" ones like Sokoban. 2/
We study MuZero (deepmind.com/blog/article/m…), a state-of-the-art MBRL algorithm with connections to a number of other model-based methods, including MPC, policy iteration, and Dyna. 4/
In the paper, we perform a large number of variations and ablations across eight environments, including continuous control, action games, and strategic games. 5/
Our results show that (1) planning at test time adds little (in most envs), (2) simple & shallow planning is sufficient for learning (again, in most envs), and (3) that planning may not aid generalization as much as you might expect---even with a perfect model! 6/
From these results, we draw two main conclusions. First, different environments (i.e. beyond Mujoco, Atari & Sokoban) are sorely needed to evaluate advances in MBRL, particularly for "reasoning". 7/
Second, value functions, policies, and transition models _all_ need to generalize well if we want to leverage MBRL for generalization. But therein lies a catch-22: if these all generalize, do we even need to do planning at all? 8/
My view is still "yes", but is more nuanced than before! And I (even more strongly now) believe that we need to find better inductive biases for value functions & policies, as well as world models. 9/
This work was a surprising, but overall satisfying, journey to better understand some characteristics of model-based deep RL. For more details, I encourage you to check out the paper! arxiv.org/abs/2011.04021 10/10
• • •
Missing some Tweet in this thread? You can try to
force a refresh