Can RL algorithms be replaced with transformer-based language models? We’ve looked at this question with our work on Decision Transformer:

Website: sites.google.com/corp/berkeley.…
Code: github.com/kzl/decision-t…

1/8
Decision Transformer is just a GPT model conditioned on desired returns. Returns, states and actions are fed into the model like tokens in a sentence (trajectory).

At evaluation time, specify the desired episode return and sequentially sample actions to get your policy.

2/8
For simplicity, we consider offline RL setting (although we aren't limited to this).

In offline RL, we train on a fixed dataset of collected experience, mimicking language modeling setup and enabling data-driven behavior learning. But this isn't just imitation learning...

3/8
Like Q-learning algorithms, Decision Transformer can "stitch" together subsequences from distinct training examples - just with a sequence modeling objective!

When trained only on random walks over a graph, Decision Transformer learns to generate an optimal shortest path:

4/8
On commonly studied offline RL benchmarks, we find this simple idea of sequence modeling with a scalable transformer model performs on par (or better) than SoTA model-free offline RL algorithms!

5/8
Unlike traditional RL methods that learn narrow policies, Decision Transformer is naturally a multi-task model.

By conditioning on different target returns, we can output many different policies - in some cases, even extrapolating beyond the dataset:

6/8
Casting RL as a simple transformer trained with supervised learning would allow us to leverage the scalability & infra of successful models such as BERT, GPT-3, DALL-E for RL. We hope this work encourages more steps in this direction.

7/8
Really enjoyed working on this with fantastic colleagues @lchen915 @_kevinlu @aravindr93 @kimin_le2 @adityagrover_ @MishaLaskin @pabbeel @AravSrinivas

8/8

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Igor Mordatch

Igor Mordatch Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(