Tweet

Andrea Lonza

Dec 18 • 7 tweets • 5 min read

The GPT of Robotics? RT-1

RT-1 is a 2y effort to bring the power of open-ended task-agnostic training with a high-capacity architecture to the Robotic world.

The magic sauce? A big and diverse robotic dataset + an efficient Transformer-based architecture
🧵

RT-1 learn to take decisions in order to complete a task via imitation from a dataset of 130k episodes, about 700 general tasks, acquired over the course of 17mo.

The architecture of RT-1 is made of:
- A Vision-Language CNN-based architecture that encode the task instruction and image into 81 tokens
- A TokenLearner that attends over the 81 tokens and compress them to 8
- A Decoder-only Transformer that predicts the next action

It was evaluated over 3000 real-world trials... A lot of work!
What they found is that RT-1, differently from its predecessors (BC-Z & Gato), has greater generalization skills. Perform much better at unseen tasks and with more visual clutter.