Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Tony Z. Zhao

@tonyzzhao

Mar 27, 2023 • 10 tweets • 4 min read • Read on X

Scrolly

How can robots acquire fine-grained manipulation skills?

Introducing ACT: Action Chunking with Transformers 🤖

Key idea: Imitation, but predict actions in chunks instead of one at a time.

Here are results with only ~15min of demonstrations, running on low-cost arms:

https://twitter.com/tonyzzhao/status/1640393026341322754

In case you missed ALOHA 🏖, the hardware we use for all these experiments, here is the thread!

https://twitter.com/tonyzzhao/status/1640393026341322754

Fine manipulation is difficult: either from RL, Sim2Real, or Imitation.

- Hard exploration and sparse reward
- Large Sim2Real gap
- Compounding error for BC
- No large dataset

We introduce three important design choices behind ACT, an efficient imitation learning method:

(1) Predict action sequence

Standard BC predicts one action at a time, while a fine manipulation task can have >1000 steps easily.

Predicting action in chunks slows down compounding error, and can better model non-stationary human behavior.

(2) Generative model policy

The policy is trained as the decoder of a VAE, reconstructing action chunks from latent z, 4 RGB images, and proprioception.

Intuitively, z extracts the “style” of the action chunk.

This is crucial when learning from human demos.

(3) Transformer

We modernize the VAE by using a BERT-like encoder and a DETR-like decoder, training end-to-end from scratch.

This transformer architecture benefits more from chunking than ConvNets and non-parametric methods.

With all above, ACT obtains 64%, 96%, 84%, 92% success for 4 tasks shown, with objects randomized along the 15 cm line.

It does not just memorize the training data, and is able to react to external disturbances:

It is also robust to a certain level of distractor objects:

Similar to ALOHA, we open source ACT together with 2 simulated environments for reproducibility. You can find it in the project website: tonyzhaozh.github.io/aloha/

We hope ALOHA+ACT would be a helpful resource towards advancing fine-grained manipulation!

@chelseabfinn

Personally, this is a challenging project to work on, spanning from hardware to ML.
It would certainly not be possible without my amazing advisor @chelseabfinn and collaboration from @svlevine @Vikashplus!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @tonyzzhao

Tony Z. Zhao

@tonyzzhao

Jan 3, 2024

Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware!
A low-cost, open-source, mobile manipulator.

One of the most high-effort projects in my past 5yrs! Not possible without co-lead @zipengfu and @chelseabfinn.

At the end, what's better than cooking yourself a meal with the 🤖🧑‍🍳

https://x.com/tonyzzhao/status/1640393026341322754

How does 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀 work? We seek to achieve a few more goals to augment the dexterity of the original 𝐀𝐋𝐎𝐇𝐀:

https://x.com/tonyzzhao/status/1640393026341322754

1. Moves fast. Similar to human walking of 1.42m/s.
2. Stable. Manipulate heavy pots, a vacuum, etc.
3. Whole-body. All dofs teleoperated simultaneously.
4. Untethered. Onboard power and compute.

Read 7 tweets

Tony Z. Zhao

@tonyzzhao

Mar 27, 2023

@stanford

Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation

After 8 months iterating @stanford and 2 months working with beta users, we are finally ready to release it!

Here is what ALOHA is capable of:

@Stanford

@Stanford We built ALOHA to be maximally user-friendly for researchers: it is simple, dependable and performant.

The whole system costs <$20k, yet it is more capable than setups with 5-10x the price.

How does it work? ALOHA has two leader & two follower arms, and syncs the joint positions from leaders to followers at 50Hz. The user teleops by simply moving the leader robots.

This takes 10 lines to implement, yet intuitive and responsive anywhere within the joint limits.

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Tony Z. Zhao

Try unrolling a thread yourself!

More from @tonyzzhao

Tony Z. Zhao

Tony Z. Zhao

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!