Tweet

Ted Xiao

Feb 10 • 14 tweets • 8 min read

The optimism in robotics research is absolutely incredible these days! I believe all the pieces we need for a “modern attempt at embodied intelligence” are ready. At recent talks, I pitched a potential recipe, and I’d like to share it with you.

Let’s break down the key points 🔑

2) The first place to start might be to ask: why isn't robotics solved yet? The challenge is that even the most difficult robotics research settings are so many orders of magnitude less complex than the noise and chaos of the real world. How can we bridge this gap?

https://twitter.com/JacobSteinhardt/status/1478764826063081477

3) I propose that we’ll *have* to leverage the emergent capabilities of internet-scale models to make this huge leap from the lab to the wild world. Emergence as a phenomenon is so powerful; more is not just more, more is different.

https://twitter.com/JacobSteinhardt/status/1478764826063081477

4) So, how do we build a robotics foundation model that will give us these emergent capabilities? Let’s leverage three important trends to come up with ingredients that will help us prepare a potential recipe for building a robotics foundation model.

5) Trend #1: Robotics has been moving from online methods to offline methods. This is quite a dramatic paradigm shift – robot learning was once synonymous with online reinforcement learning, but offline methods like imitation learning have been picking up steam.

6) Ingredient #1: Let’s focus on separating the challenges of how to collect diverse robotic data and how to learn from that data. Traditionally, there has been a tight coupling between data generation and data consumption, but it seems that we can split this problem up!

@karpathy

7) Trend #2: ML scaling has driven tremendous growth in AI.
Ingredient #2: Let’s stand on the shoulders of giants and use their best design principles: Transformers are general-purpose differentiable computers (@karpathy) and tokenization makes everything sequence modeling.

@karpathy

@karpathy 8) Trend #3: Foundation models themselves have gotten better, and they’ve gotten better faster.
Ingredient #3: Leverage “Bitter Lesson 2.0” (

https://twitter.com/hausman_k/status/1612509549889744899

) and work on methods that scale with foundation models. Use language as the universal API.

@karpathy

@karpathy 9) Combining the ingredients together suggests one approach for a modern attempt at embodied intelligence. These ideas have shaped a lot of my own research, and I’m excited to see how these trends evolve in the future!

10) I’ll note a few brief examples of recent work from my team and how they tie in with this recipe.

RT-1 (robotics-transformer.github.io) uses Ingredient #1 and Ingredient #2 by applying a Transformer BC policy onto a large discretized offline demonstration dataset.

https://twitter.com/hausman_k/status/1511152160695730181

11) SayCan (say-can.github.io) and Inner Monologue (innermonologue.github.io) leverage Ingredient #3 to utilize LLMs for robotic planning. By expressing plans and reasoning in language we import common sense zero-shot from increasingly better LLMs.

https://twitter.com/hausman_k/status/1511152160695730181

https://twitter.com/xiao_ted/status/1595115936930942976

12) DIAL (instructionaugmentation.github.io) uses Ingredient #1 and Ingredient #3 by using VLMs to perform data augmentation on language labels of offline datasets. VLMs use language to convey internet-scale semantics and concepts to existing datasets.

https://twitter.com/xiao_ted/status/1595115936930942976

https://twitter.com/xiao_ted/status/1624114540010143746

13) More work related to these ingredients is coming out very soon! But, I hope this initial recipe for a robotics foundation model excites you (or at the very least, intrigues you) 🙂

If this thread was informative, please Like/Retweet the first Tweet:

https://twitter.com/xiao_ted/status/1624114540010143746

@hausman_k

14) Each of these works was the culmination of huge collaborations; I feel so privileged to work w/ brilliant colleagues (too many to list) but some are:
@hausman_k @svlevine @Kanishka_Rao @JonathanTompson @Yao__Lu @brian_ichter @xf1280 @keerthanpg @YevgenChebotar @SirrahChan!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @xiao_ted

Ted Xiao

@xiao_ted

Jan 12

🚨New RL impact just dropped🚨

1) My friend is a high level Rocket League player and just alerted me that an open-sourced agent trained with reinforcement learning + self play (github.com/Rolv-Arild/Nec…) has been steamrolling on public servers! It's in the top 0.5% ELO bracket.

2) The agent, called Nexto, uses a distributed self-play system (rlbot.org) to train Soft Actor-Critic agents on top of Perceiver networks. Recently, rogue Rocket League players have been deploying Nexto directly to ranked matches, where it’s reached very high ELOs.

3) It’s not perfect, and there are some pretty funny failure modes. For example, humans can slowly roll the ball towards the goal and then ram the enemy car, destroying it. This is usually very easy to dodge for humans, but Nexto does not make any evasive maneuvers!

Read 8 tweets

Ted Xiao

@xiao_ted

Dec 21, 2022

The golden days of internet-scale models achieving unprecedented zero-shot results seem to be waning. The new Big Thing is subsequent fine tuning with humans increasingly out of the loop. How does this work?

Let’s explore *Prior Amplification* 🔎

(1/N)

Large internet datasets, whether they are digital art or literature or Reddit posts, reflect some innate notion of the human condition. These kernels of truth can be shaped (ie. NSFW or hate speech filters) but always stem from some subset of human-produced content. (2/N)

Massive Transformer models can reasonably model the broad strokes of these internet-scale distributions. Afterwards, they can then be prompted to hone in on a specific region of the training distribution, such as the part containing helpful and respectful dialogue. (3/N)

Read 16 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Ted Xiao

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @xiao_ted

Ted Xiao

Ted Xiao

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!