Thread by @svlevine on Thread Reader App

We've updated Trajectory Transformer (Transformers + model-based RL) for NeurIPS camera-ready, now with more complete results that include Ant Maze, including value functions, and also a blog post summarizing the method!
arxiv.org/abs/2106.02039
bair.berkeley.edu/blog/2021/11/1…

A thread:

Trajectory transformer is a "one big dumb model" approach to model-based RL: every single dimension of every state and action is a (discrete) token in a huge sequence. The model doesn't distinguish between states vs actions vs rewards, they're all just tokens.

Although the Transformer is "monolithic", it *discovers* things like near-Markovian attention patterns (left) and a kind of "action smoothing" (right), where sequential actions are correlated to each other. So the Transformer learns about the structure in RL, to a degree.

It also makes *very* long-horizon rollouts successfully, far longer than standard autoregressive models p(s'|s,a). So something about a big "dumb" model works very well for modeling complex dynamics, suggesting it might work very well for model-based RL.

For control, we can simply run beam search, using reward instead of likelihood as the score. Of course, we could use other planners too. On the (comparatively easy) D4RL locomotion tasks, Trajectory Transformer is on par with the best prior method (CQL).

But if we *combine* Trajectory Transformer with a good Q-function (e.g., from IQL), we can solve the much more challenging Ant Maze tasks with state-of-the-art results, much better than all prior methods. Ant Maze is much harder, because it requires temporal compositionality.

This is significant because only dynamic programming methods perform well on Ant Maze (e.g., Decision Transformer is on par with simple behavioral cloning) -- to our knowledge Trajectory Transformer + IQL is the first model-based approach that improves over pure DP on these tasks

This is joint work with Michael Janner & Qiyang Li, accepted for a spotlight presentation at NeurIPS 2021:
trajectory-transformer.github.io
arxiv.org/abs/2106.02039
Code: github.com/JannerM/trajec…

Also, if you want to read the paper that we "borrowed" the Q-function from for Ant Maze, it's here: arxiv.org/abs/2110.06169

@ikostrikov makes some really nice Q-functions😉

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll