Andrea Lonza Profile picture
Dec 29 9 tweets 6 min read
ChatGPT for Robotics?
@Deepmind latest work: A general AI agent that can perform any task from human instructions!

Or at least those allowed in "the playhouse"

The cherry on top of this agent is its RL fine-tuning from human feedback, or RLHF. As in ChatGPT
1/n
The base layer of the agent is trained with imitation learning and conditioned on language instructions

Initially, the agent had mediocre abilities

However, when it was fine-tuned with Reinforcement Learning and allowed to act independently, its abilities 🆙 significantly

2/n
The authors structured the RL problem by training a Reward Model on human feedback, and then using this RW model to optimize the agent with online RL

The RW model, also called Inter-temporal Bradley-Terry (IBT), is trained to predict the preferences of sub-trajectories

3/n
A sub-trajectory is preferred over another of the same episode if it represents a improvement toward the goal. Not preferred if it's a regression.

Does it work? Check out this example 📊
It appears to be effective

4/n
Btw, they also augmented the loss of the IBT model with BC and contrastive SSL losses.

The BC+RL agent was trained using a "setter-replay" methodology. The environment was recreated based on some initial configs and the agent was left to interact freely & learn.

5/n
Guess what? BC+RL performed much better than everything else

They evaluated the agent on multiple ways: offline and online, both automatically and manually
In every context the BC+RW model is the best
6/n
Bonus point 1:
- BC + RL benefit from model scaling - Nice!

Bonus point 2:
- The agent can also be improved iteratively.
And it gets a lot better!
7/n

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrea Lonza

Andrea Lonza Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lonzaandrea

Dec 30
This is the story of an embodied multi-modal agent crafted over 4 papers and told in 4 posts

The embodied agent is able to perceive, manipulate the world, and react to human instructions in a 3D world
Work done by the Interactive Team at @DeepMind between 2019 and 2022
🧵 Image
Imitating Interactive Intelligence arxiv.org/abs/2012.05672
The case for training the agent using Imitation Learning is outlined
The environment "The Playroom" is generated
The general multi-modal architecture is crafted
At the end, an auxiliary simil-GAIL loss is crucial
1/n Image
Interactive Agents with IL & SSL
arxiv.org/abs/2112.03763
In the end it's all about scale and simplicity
The agent was hungry for data, so it was fed more
A simpler contrastive cross-modal loss replaced GAIL
A hierarchical 8-step action was introduced
New agent code name: MIA
2/n Image
Read 6 tweets
Dec 18
The GPT of Robotics? RT-1

RT-1 is a 2y effort to bring the power of open-ended task-agnostic training with a high-capacity architecture to the Robotic world.

The magic sauce? A big and diverse robotic dataset + an efficient Transformer-based architecture
🧵
RT-1 learn to take decisions in order to complete a task via imitation from a dataset of 130k episodes, about 700 general tasks, acquired over the course of 17mo.
The architecture of RT-1 is made of:
- A Vision-Language CNN-based architecture that encode the task instruction and image into 81 tokens
- A TokenLearner that attends over the 81 tokens and compress them to 8
- A Decoder-only Transformer that predicts the next action
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(