Allen Z. Ren Profile picture
Jul 6 β€’ 12 tweets β€’ 5 min read Twitter logo Read on Twitter
LLMs can generate plans and write robot code πŸ“ but they can also make mistakes. How do we get LLMs to 𝘬𝘯𝘰𝘸 𝘸𝘩𝘦𝘯 𝘡𝘩𝘦𝘺 π˜₯𝘰𝘯'𝘡 𝘬𝘯𝘰𝘸 🀷 and ask for help?

Read more on how we can do this (with statistical guarantees) for LLMs on robots πŸ‘‡
https://t.co/M9lUqlZ5cBrobot-help.github.io
Exploring LLM uncertainty in the context of generating robot plans is especially crucial because of safety considerations 🚧

Instructions from people can be ambiguous, and LLMs are prone to hallucinating. Poor outputs can lead to unsafe actions and consequences.
For example, if a robot πŸ€– is tasked to "put the bowl in the microwave" but sees two bowls – a metal and plastic one – the uncertainty of the LLM should trigger the robot to ask for help πŸ›Ÿ

Greedily choosing e.g. the metal bowl can damage the microwave or even cause a fire πŸ”₯
Off-the-shelf LLM predictions do come with confidence scores, but can be miscalibrated πŸ“

Our framework "KnowNo" builds off of conformal prediction (CP) theory to model LLM uncertainty: generate a set of predictions, then quantify how likely it contains a correct option.
CP provides statistical guarantees: with user-specified probability, the prediction sets contain the correct plans at test time!
KnowNo triggers human helpπŸ›Ÿwhen the prediction set has more than one option. Baselines that use the scores without calibration πŸ“or directly ask LLM if it is uncertain can trigger unnecessary help.
KnowNo can also quantify LLM planner uncertainty in multi-step planning settings, such as sorting food items πŸ₯• based on human preferences with feedback.
In mobile manipulation settings, common home-robot task instructions can often under-specify the object (β€œthe chips”) or target location (β€œthe drawer”)
In bimanual settings, the arms' reachability is limited and there is ambiguity in the choice of arm for the specific task
We ran all experiments with PaLM-2L model, which provides reasonably calibrated confidences. We find that GPT3.5 suffers from recency bias in MCQA. Nonetheless, KnowNo still achieves the target success level by triggering more human help.
This work comes from collaboration between @EPrinceton and @DeepMind, including @anushridixit111, Alexandra Bodrova, @Sumeet_Robotics, @stephenltu, Noah Brown, @sippeyxp, @leilatakayama, @xf1280, Jake Varley, @Zhenjia_Xu, @DorsaSadigh, @andyzeng_, @Majumdar_Ani
Future work could incorporate uncertainty of vision-language models in the pipeline. Quantifying uncertainty builds trust 🀝between us and robots. Let’s make them safe and reliable!

Website:
Paper: https://t.co/Z0xkZr4dsW
Colab codes available soonrobot-help.github.io
arxiv.org/abs/2307.01928

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Allen Z. Ren

Allen Z. Ren Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(