Thread by @lonzaandrea on Thread Reader App

This is the story of an embodied multi-modal agent crafted over 4 papers and told in 4 posts

The embodied agent is able to perceive, manipulate the world, and react to human instructions in a 3D world
Work done by the Interactive Team at @deepmind between 2019 and 2022
🧵

Imitating Interactive Intelligence arxiv.org/abs/2012.05672
The case for training the agent using Imitation Learning is outlined
The environment "The Playroom" is generated
The general multi-modal architecture is crafted
At the end, an auxiliary simil-GAIL loss is crucial
1/n

Interactive Agents with IL & SSL
arxiv.org/abs/2112.03763
In the end it's all about scale and simplicity
The agent was hungry for data, so it was fed more
A simpler contrastive cross-modal loss replaced GAIL
A hierarchical 8-step action was introduced
New agent code name: MIA
2/n

Evaluating Interactive Agents
arxiv.org/abs/2205.13274
Evaluation becomes the bottleneck
Agents evaluated with a new approach called Standardized Test Suite. Still manual, but offline. Faster, more interpretable & controllable

MIA on steroids. 164M params and LLM
3/n

Interactive Agent with RLHF
arxiv.org/abs/2211.11602
RL is introduced

A new breed of agent is created: similar to MIA but with RLHF tuning and a learned RW model
As always.. The agent ingested more data
Also a new interactive evaluation is introduced
4/n

Question: With the new RLHF approach, did it converge to a more standard training methodology?

Great work by the Interactive Agents Team at @deepmind : @arahuja @fede_carne @petko87ig @_agoldin @countzerozzz @TheGeorgePowell @santoroAI and others

#deeplearning #RL #ML #AI
END/n

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll