Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Andrea Lonza

@lonzaandrea

Dec 30, 2022 • 6 tweets • 6 min read • Read on X

Scrolly

@deepmind

This is the story of an embodied multi-modal agent crafted over 4 papers and told in 4 posts

The embodied agent is able to perceive, manipulate the world, and react to human instructions in a 3D world
Work done by the Interactive Team at @deepmind between 2019 and 2022
🧵

Imitating Interactive Intelligence arxiv.org/abs/2012.05672
The case for training the agent using Imitation Learning is outlined
The environment "The Playroom" is generated
The general multi-modal architecture is crafted
At the end, an auxiliary simil-GAIL loss is crucial
1/n

Interactive Agents with IL & SSL
arxiv.org/abs/2112.03763
In the end it's all about scale and simplicity
The agent was hungry for data, so it was fed more
A simpler contrastive cross-modal loss replaced GAIL
A hierarchical 8-step action was introduced
New agent code name: MIA
2/n

Evaluating Interactive Agents
arxiv.org/abs/2205.13274
Evaluation becomes the bottleneck
Agents evaluated with a new approach called Standardized Test Suite. Still manual, but offline. Faster, more interpretable & controllable

MIA on steroids. 164M params and LLM
3/n

https://twitter.com/lonzaandrea/status/1608465889053274112?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1608465889053274112%7Ctwgr%5E7cb197e4df533bfb27bd4fac8db6ac3ad468dc65%7Ctwcon%5Es1_c10&ref_url=https%3A%2F%2Fpublish.twitter.com%2F%3Fquery%3Dhttps3A2F2Ftwitter.com2Flonzaandrea2Fstatus2F1608465889053274112widget%3DTweet

Interactive Agent with RLHF
arxiv.org/abs/2211.11602
RL is introduced

A new breed of agent is created: similar to MIA but with RLHF tuning and a learned RW model
As always.. The agent ingested more data
Also a new interactive evaluation is introduced
4/n

@deepmind

Question: With the new RLHF approach, did it converge to a more standard training methodology?

Great work by the Interactive Agents Team at @deepmind : @arahuja @fede_carne @petko87ig @_agoldin @countzerozzz @TheGeorgePowell @santoroAI and others

#deeplearning #RL #ML #AI
END/n

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lonzaandrea

Andrea Lonza

@lonzaandrea

Dec 29, 2022

@Deepmind

ChatGPT for Robotics?
@Deepmind latest work: A general AI agent that can perform any task from human instructions!

Or at least those allowed in "the playhouse"

The cherry on top of this agent is its RL fine-tuning from human feedback, or RLHF. As in ChatGPT
1/n

The base layer of the agent is trained with imitation learning and conditioned on language instructions

Initially, the agent had mediocre abilities

However, when it was fine-tuned with Reinforcement Learning and allowed to act independently, its abilities 🆙 significantly

2/n

The authors structured the RL problem by training a Reward Model on human feedback, and then using this RW model to optimize the agent with online RL

The RW model, also called Inter-temporal Bradley-Terry (IBT), is trained to predict the preferences of sub-trajectories

3/n

Read 9 tweets

Andrea Lonza

@lonzaandrea

Dec 18, 2022

The GPT of Robotics? RT-1

RT-1 is a 2y effort to bring the power of open-ended task-agnostic training with a high-capacity architecture to the Robotic world.

The magic sauce? A big and diverse robotic dataset + an efficient Transformer-based architecture
🧵

RT-1 learn to take decisions in order to complete a task via imitation from a dataset of 130k episodes, about 700 general tasks, acquired over the course of 17mo.

The architecture of RT-1 is made of:
- A Vision-Language CNN-based architecture that encode the task instruction and image into 81 tokens
- A TokenLearner that attends over the 81 tokens and compress them to 8
- A Decoder-only Transformer that predicts the next action

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Andrea Lonza

Try unrolling a thread yourself!

More from @lonzaandrea

Andrea Lonza

Andrea Lonza

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!