Tweet

Igor Mordatch

2 Jun, 8 tweets, 4 min read

Can RL algorithms be replaced with transformer-based language models? We’ve looked at this question with our work on Decision Transformer:

Website: sites.google.com/corp/berkeley.…
Code: github.com/kzl/decision-t…

1/8

Decision Transformer is just a GPT model conditioned on desired returns. Returns, states and actions are fed into the model like tokens in a sentence (trajectory).

At evaluation time, specify the desired episode return and sequentially sample actions to get your policy.

2/8

For simplicity, we consider offline RL setting (although we aren't limited to this).

In offline RL, we train on a fixed dataset of collected experience, mimicking language modeling setup and enabling data-driven behavior learning. But this isn't just imitation learning...

3/8

Like Q-learning algorithms, Decision Transformer can "stitch" together subsequences from distinct training examples - just with a sequence modeling objective!

When trained only on random walks over a graph, Decision Transformer learns to generate an optimal shortest path:

4/8

On commonly studied offline RL benchmarks, we find this simple idea of sequence modeling with a scalable transformer model performs on par (or better) than SoTA model-free offline RL algorithms!

5/8

Unlike traditional RL methods that learn narrow policies, Decision Transformer is naturally a multi-task model.

By conditioning on different target returns, we can output many different policies - in some cases, even extrapolating beyond the dataset:

6/8

Casting RL as a simple transformer trained with supervised learning would allow us to leverage the scalability & infra of successful models such as BERT, GPT-3, DALL-E for RL. We hope this work encourages more steps in this direction.

7/8

@lchen915

Really enjoyed working on this with fantastic colleagues @lchen915 @_kevinlu @aravindr93 @kimin_le2 @adityagrover_ @MishaLaskin @pabbeel @AravSrinivas

8/8

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Igor Mordatch

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?