Is RL always data inefficient? Not necessarily. Framework for Efficient Robotic Manipulation (FERM) - shows real robots can learn basic skills from pixels with sparse reward in *30 minutes* using 1 GPU 🦾

paper: bit.ly/2M3CFPG
site / code: bit.ly/390Sz6g

1/N
Real-robot RL is challenging for a number of reasons, and data efficiency is chief among them. Common workarounds are training in simulation and transferring the learned policy to the real robot (Sim2Real) or parallelizing training with robot farms (QT-Opt).

2/N
But what makes RL data inefficient in the first place? One hypothesis - (i) representation learning and (ii) exploration. In principle, if we solve both problems, RL should be able to learn quickly.

3/N
In past work (CURL / RAD) we showed that with data augmentation (internalized explicitly or implicitly) RL agents learn as efficiently from pixels as they do from state, making contrastive + data aug reps a natural choice for real robot learning.

4/N
For exploration we use a handful (10) of human demos. But doesn't that defeat the whole point? Not really, the alternative to RL would be to use imitation learning, but imitation learning requires 100s of demos to work reliably, which takes hours or days of tedious work.

5/N
The benefit of RL is that, given only a handful demonstrations (minutes of human effort), the agent can continue exploring on its own. We combine rep learning methods that have been effective in sim with demos into a framework for efficient real-robot learning.

6/N
The framework has 3 steps:

1. Initialize the replay buffer with 10 demos (10 mins)
2. Initialize the CNN encoder with contrastive pre-training (1 min)
3. Train RL agent with data aug (30 mins)

Demo collection takes 10 mins. Training time is roughly 30 mins.

7/N
Our experimental set up is straightforward and easy to reproduce by other labs. We have 1 robot, 1 gpu, 2 cameras, and a sparse reward for each task. That's it - no MoCap, no state estimation, no dense reward specification.

8/N
FERM is able to learn 6 standard robotic tasks in 15-50 mins (30 mins avg) - reach, pick, move, pull, light switch, and open door. Practically, this means you can spin up a task in one evening of work (including constructing a sparse reward function).

9/N
Not too surprisingly, RL significantly outperforms imitation learning in this low-demonstration regime. Behavior cloning is unable to solve even the simplest tasks in this setting.

10/N
Finally, the learned RL policies are quite robust. Here are some videos showing the RL agent's policy in the presence of adversarial perturbations and unseen objects.

11/N
We hope that FERM enables other research groups (without access to insane resources) to quickly iterate on their ideas by training directly on their robots.

12/N
Amazing work and persistence by co-lead authors Albert Zhan and @RuihanZhao, who pulled some heroic feats in order to produce real-robot experiments during the pandemic. Thank you to @LerrelPinto and @pabbeel for the great collaboration!

13/N
/END
typo, that gif is meant for 11/N tweet of this thread 🤷‍♂️

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Michael (Misha) Laskin

Michael (Misha) Laskin Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!