Tweet

More from @svlevine

Sergey Levine

@svlevine

19 Nov

We've updated Trajectory Transformer (Transformers + model-based RL) for NeurIPS camera-ready, now with more complete results that include Ant Maze, including value functions, and also a blog post summarizing the method!
arxiv.org/abs/2106.02039
bair.berkeley.edu/blog/2021/11/1…

A thread:

Trajectory transformer is a "one big dumb model" approach to model-based RL: every single dimension of every state and action is a (discrete) token in a huge sequence. The model doesn't distinguish between states vs actions vs rewards, they're all just tokens.

Although the Transformer is "monolithic", it *discovers* things like near-Markovian attention patterns (left) and a kind of "action smoothing" (right), where sequential actions are correlated to each other. So the Transformer learns about the structure in RL, to a degree.

Read 9 tweets

Sergey Levine

@svlevine

19 Oct

@chelseabfinn

To make an existing model more robust at test time: augment a single test image in many ways, finetune model so that predictions on augmented images "agree", minimizing marginal entropy. This is the idea behind MEMO (w/ Marvin Zhang & @chelseabfinn): arxiv.org/abs/2110.09506

🧵>

MEMO is a simple test-time adaptation method that takes any existing model (no change during training), and finetunes it on one image:
1. generate augmentations of test image
2. make predictions on all of them
3. minimize marginal entropy of these predictions (make them similar)

This can significantly improve a model's robustness to OOD inputs. Here are examples on ImageNet-C where MEMO fixes mistakes the model would have made without MEMO. This doesn't involve additional assumptions, the training is exactly the same, and it operates on one test image.

Read 4 tweets

Sergey Levine

@svlevine

23 Sep

Offline RL lets us run RL without active online interaction, but tuning hyperparameters, model capacity etc. still requires rollouts, or validation tasks. In our new paper, we propose guidelines for *fully offline* tuning for algorithms like CQL arxiv.org/abs/2109.10813

A thread:

In supervised learning, we have training and validation sets, and this works great for tuning. There is no equivalent in RL, making tuning hard. However, with CQL, when there is too much or too little model capacity, we do get very characteristic estimated vs true Q-value curves

Of course, the true return is unknown during offline training, but we can still use our understanding of the trends of estimated Q-values to provide guidelines for how to adjust model capacity. These guidelines are not guaranteed to work, but seem to work well in practice.

Read 6 tweets

Sergey Levine

@svlevine

25 Jul

@sidgreddy

An "RL" take on compression: "super-lossy" compression that changes the image, but preserves its downstream effect (i.e., the user should take the same action seeing the "compressed" image as when they saw original) sites.google.com/view/pragmatic…

w @sidgreddy & @ancadianadragan

🧵>

The idea is pretty simple: we use a GAN-style loss to classify whether the user would have taken the same downstream action upon seeing the compressed image or not. Action could mean button press when playing a video game, or a click/decision for a website.

The compression itself is done with a generative latent variable model (we use styleGAN, but VAEs would work great too, as well as flows). PICO basically decides to throw out those bits that it determines (via its GAN loss) won't change the user's downstream decision.

Read 6 tweets

Sergey Levine

@svlevine

23 Jul

RAIL will be presenting a number of exciting late breaking poster results at the RL4RealLife WS #ICML2021 (8 pm PT today!): sites.google.com/view/RL4RealLi…

Algorithms for real-world RL w/ mobile manipulators, lifelong meta-learning methods, principled multi-task data sharing.

A thread:

@ColinearDevin

We'll show how RL can control robots that learn to clean up a room, entirely in the real world. By Charles Sun, @ColinearDevin, @abhishekunique7, @jendk3r, @GlenBerseth.

@GlenBerseth

We'll present CoMPS, an algorithm for online continual meta-learning, where an agent meta-learns tasks one by one, with each task accelerating future tasks. By @GlenBerseth, WilliamZhang365, @chelseabfinn

Read 4 tweets

Sergey Levine

@svlevine

23 Jul

@aviral_kumar2

In RL, "implicit regularization" that helps deep learning find good solutions can actually lead to huge instability. See @aviral_kumar2 talk on DR3:
7/23 4pm PT RL for real: icml.cc/virtual/2021/w…
7/24 5:45pm PT Overparameterization WS talk icml.cc/virtual/2021/w…
#ICML2021

🧵>

You can watch the talk in advance here:
And then come discuss the work with Aviral at the poster sessions! This work is not released yet, but it will be out shortly.

We're quite excited about this result, and I'll try to explain why.

Deep networks are overparameterized, meaning there are many parameter vectors that fit the training set. So why does it not overfit? While there are many possibilities, they all revolve around some kind of "implicit regularization" that leads to solutions that generalize well.

Read 8 tweets

Share this page!

Sergey Levine

Try unrolling a thread yourself!

More from @svlevine

Sergey Levine

Sergey Levine

Sergey Levine

Sergey Levine

Sergey Levine

Sergey Levine

Did Thread Reader help you today?

Like this author's thread?