This is the story of an embodied multi-modal agent crafted over 4 papers and told in 4 posts

The embodied agent is able to perceive, manipulate the world, and react to human instructions in a 3D world
Work done by the Interactive Team at @deepmind between 2019 and 2022
🧵
Imitating Interactive Intelligence arxiv.org/abs/2012.05672
The case for training the agent using Imitation Learning is outlined
The environment "The Playroom" is generated
The general multi-modal architecture is crafted
At the end, an auxiliary simil-GAIL loss is crucial
1/n
Interactive Agents with IL & SSL
arxiv.org/abs/2112.03763
In the end it's all about scale and simplicity
The agent was hungry for data, so it was fed more
A simpler contrastive cross-modal loss replaced GAIL
A hierarchical 8-step action was introduced
New agent code name: MIA
2/n
Evaluating Interactive Agents
arxiv.org/abs/2205.13274
Evaluation becomes the bottleneck
Agents evaluated with a new approach called Standardized Test Suite. Still manual, but offline. Faster, more interpretable & controllable

MIA on steroids. 164M params and LLM
3/n
Question: With the new RLHF approach, did it converge to a more standard training methodology?

Great work by the Interactive Agents Team at @deepmind : @arahuja @fede_carne @petko87ig @_agoldin @countzerozzz @TheGeorgePowell @santoroAI and others

#deeplearning #RL #ML #AI
END/n

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrea Lonza

Andrea Lonza Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lonzaandrea

Dec 29, 2022
ChatGPT for Robotics?
@Deepmind latest work: A general AI agent that can perform any task from human instructions!

Or at least those allowed in "the playhouse"

The cherry on top of this agent is its RL fine-tuning from human feedback, or RLHF. As in ChatGPT
1/n
The base layer of the agent is trained with imitation learning and conditioned on language instructions

Initially, the agent had mediocre abilities

However, when it was fine-tuned with Reinforcement Learning and allowed to act independently, its abilities 🆙 significantly

2/n
The authors structured the RL problem by training a Reward Model on human feedback, and then using this RW model to optimize the agent with online RL

The RW model, also called Inter-temporal Bradley-Terry (IBT), is trained to predict the preferences of sub-trajectories

3/n
Read 9 tweets
Dec 18, 2022
The GPT of Robotics? RT-1

RT-1 is a 2y effort to bring the power of open-ended task-agnostic training with a high-capacity architecture to the Robotic world.

The magic sauce? A big and diverse robotic dataset + an efficient Transformer-based architecture
🧵
RT-1 learn to take decisions in order to complete a task via imitation from a dataset of 130k episodes, about 700 general tasks, acquired over the course of 17mo.
The architecture of RT-1 is made of:
- A Vision-Language CNN-based architecture that encode the task instruction and image into 81 tokens
- A TokenLearner that attends over the 81 tokens and compress them to 8
- A Decoder-only Transformer that predicts the next action
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(