Andrea Lonza Profile picture
Founder / DL engineer / Author --- Intrigued by intelligence, AGI, & innovation

Dec 29, 2022, 9 tweets

ChatGPT for Robotics?
@Deepmind latest work: A general AI agent that can perform any task from human instructions!

Or at least those allowed in "the playhouse"

The cherry on top of this agent is its RL fine-tuning from human feedback, or RLHF. As in ChatGPT
1/n

The base layer of the agent is trained with imitation learning and conditioned on language instructions

Initially, the agent had mediocre abilities

However, when it was fine-tuned with Reinforcement Learning and allowed to act independently, its abilities 🆙 significantly

2/n

The authors structured the RL problem by training a Reward Model on human feedback, and then using this RW model to optimize the agent with online RL

The RW model, also called Inter-temporal Bradley-Terry (IBT), is trained to predict the preferences of sub-trajectories

3/n

A sub-trajectory is preferred over another of the same episode if it represents a improvement toward the goal. Not preferred if it's a regression.

Does it work? Check out this example 📊
It appears to be effective

4/n

Btw, they also augmented the loss of the IBT model with BC and contrastive SSL losses.

The BC+RL agent was trained using a "setter-replay" methodology. The environment was recreated based on some initial configs and the agent was left to interact freely & learn.

5/n

Guess what? BC+RL performed much better than everything else

They evaluated the agent on multiple ways: offline and online, both automatically and manually
In every context the BC+RW model is the best
6/n

Bonus point 1:
- BC + RL benefit from model scaling - Nice!

Bonus point 2:
- The agent can also be improved iteratively.
And it gets a lot better!
7/n

I hope you find it useful! Bye

#deeplearning #RL #robotics #ML #AI

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling