Tweet

Andrea Lonza

Dec 29 • 9 tweets • 6 min read

@Deepmind

ChatGPT for Robotics?
@Deepmind latest work: A general AI agent that can perform any task from human instructions!

Or at least those allowed in "the playhouse"

The cherry on top of this agent is its RL fine-tuning from human feedback, or RLHF. As in ChatGPT
1/n

The base layer of the agent is trained with imitation learning and conditioned on language instructions

Initially, the agent had mediocre abilities

However, when it was fine-tuned with Reinforcement Learning and allowed to act independently, its abilities 🆙 significantly

2/n

The authors structured the RL problem by training a Reward Model on human feedback, and then using this RW model to optimize the agent with online RL

The RW model, also called Inter-temporal Bradley-Terry (IBT), is trained to predict the preferences of sub-trajectories

3/n

A sub-trajectory is preferred over another of the same episode if it represents a improvement toward the goal. Not preferred if it's a regression.

Does it work? Check out this example 📊
It appears to be effective

4/n

Btw, they also augmented the loss of the IBT model with BC and contrastive SSL losses.

The BC+RL agent was trained using a "setter-replay" methodology. The environment was recreated based on some initial configs and the agent was left to interact freely & learn.

5/n

Guess what? BC+RL performed much better than everything else

They evaluated the agent on multiple ways: offline and online, both automatically and manually
In every context the BC+RW model is the best
6/n

Bonus point 1:
- BC + RL benefit from model scaling - Nice!

Bonus point 2:
- The agent can also be improved iteratively.
And it gets a lot better!
7/n

@deepmind

Great work by the "Interactive Agents Team" at @deepmind : @arahuja @fede_carne @petko87ig @_agoldin @countzerozzz @TheGeorgePowell @santoroAI and others

Paper: arxiv.org/pdf/2211.11602…
Blog post: deepmind.com/blog/building-…
8/n

I hope you find it useful! Bye

#deeplearning #RL #robotics #ML #AI

https://twitter.com/lonzaandrea/status/1608465889053274112?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1608465889053274112%7Ctwgr%5Ed5e948c52ca4f2c15bd3e0e4d056867d8427f246%7Ctwcon%5Es1_c10&ref_url=https%3A%2F%2Fpublish.twitter.com%2F%3Fquery%3Dhttps3A2F2Ftwitter.com2Flonzaandrea2Fstatus2F1608465889053274112widget%3DTweet

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lonzaandrea

Andrea Lonza

@lonzaandrea

Dec 30

@DeepMind

This is the story of an embodied multi-modal agent crafted over 4 papers and told in 4 posts

The embodied agent is able to perceive, manipulate the world, and react to human instructions in a 3D world
Work done by the Interactive Team at @DeepMind between 2019 and 2022
🧵

Imitating Interactive Intelligence arxiv.org/abs/2012.05672
The case for training the agent using Imitation Learning is outlined
The environment "The Playroom" is generated
The general multi-modal architecture is crafted
At the end, an auxiliary simil-GAIL loss is crucial
1/n

Interactive Agents with IL & SSL
arxiv.org/abs/2112.03763
In the end it's all about scale and simplicity
The agent was hungry for data, so it was fed more
A simpler contrastive cross-modal loss replaced GAIL
A hierarchical 8-step action was introduced
New agent code name: MIA
2/n

Read 6 tweets

Andrea Lonza

@lonzaandrea

Dec 18

The GPT of Robotics? RT-1

RT-1 is a 2y effort to bring the power of open-ended task-agnostic training with a high-capacity architecture to the Robotic world.

The magic sauce? A big and diverse robotic dataset + an efficient Transformer-based architecture
🧵

RT-1 learn to take decisions in order to complete a task via imitation from a dataset of 130k episodes, about 700 general tasks, acquired over the course of 17mo.

The architecture of RT-1 is made of:
- A Vision-Language CNN-based architecture that encode the task instruction and image into 81 tokens
- A TokenLearner that attends over the 81 tokens and compress them to 8
- A Decoder-only Transformer that predicts the next action

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Andrea Lonza

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @lonzaandrea

Andrea Lonza

Andrea Lonza

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!