Shane Gu Profile picture
Dec 4, 2021 6 tweets 5 min read Read on X
Excited to co-run the @EcoTheoryRL "data-centric RL" workshop @NeurIPSConf! Schedule: sites.google.com/corp/view/ecor…

INCREDIBLE speakers:

1) @ShaneLegg (Co-founder/Chief Scientist of @DeepMind)
2) Joelle Pineau (McGill, FAIR, MILA)
3) Pierre-Yves Oudeyer @pyoudeyer (INRIA)
4) @katjahofmann (@MSFTResearch @MSFTResearchCam)
5) Daniel Tanis (w/ @DrewPurves) @DeepMind
6) Benjamin Van Roy @Stanford
7) Warren Powell @Princeton
8) Amy Zhang @yayitsamyzhang (MILA, @UCBerkeley, FAIR -> @UTAustin )
9) Tom Griffiths @Princeton
10) Michiel van de Panne @UBC
More tweets to come for contributed talks + other fun info of the workshop!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shane Gu

Shane Gu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @shaneguML

Oct 24, 2022
(1/8) *new paper* “LLMs can self-improve”
w/ *self-generated CoTs* (“logical dark knowledge”), no GT labels:
- SoTA (74.4%->82.1% GSM8K, 90.0%->94.4% OpenBookQA, 63.4%->67.9% ANLI-A3) by fine-tuning
- SoTA “zero-shot” (GSM8K 70.1% -> 74.2%) by prompting
arxiv.org/abs/2210.11610 Image
(2/8) inspiration #1: I like analogies. When @kojima_tks @yusuke_iwasawa_ shared initial “step by step” results, my reaction was its (1) unreal engine trick of NLP, (2) temp trick in distillation arxiv.org/abs/1503.02531 @geoffreyhinton, so we called it “logical dark knowledge”😃 ImageImageImageImage
(3/8) inspiration #2: CoT+self-consistency arxiv.org/abs/2203.11171 was used everywhere. Most impressive to me was its calibration. Voting distribution is *very* calibrated: monotonic & even sometimes under-confident! e.g. when it predicts with 70%+ confidence, it’s correct 99%!
Read 8 tweets
Oct 1, 2022
Attended #TeslaAIDay2022. Engaging yet technical. (streaming link: ) Many memorable moments! 🧵1/
- 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/
- "generalist" conditional occupancy network: a single "big" network which outputs both voxels and semantics from images. Trained on LARGE dataset from auto labeling. Given where conditional/generative NeRF/OccNets are in academia (arxiv.org/abs/2209.10684), blown away by scale 3/
Read 7 tweets
Jan 31, 2022
Can pre-trained language models be used for offline RL? We look to answer this question in our new work and demonstrate SoTA-level performance on various offline RL benchmarks when adapting pre-trained LMs for RL 🤯

paper: arxiv.org/abs/2201.12122
code: github.com/machelreid/can… 1/
We look at adapting pre-trained language models (e.g. GPT2) and image models (e.g. ImageGPT) for Decision Transformer in offline RL and show consistent improvement in performance over all strong baselines, e.g.. DT, TD3+BC, CQL: 2/
Interestingly, we find that vision init does not converge, whereas even a small pre-trained language model ChibiT (where チビ means small or mini in Japanese 😆) on Wiki has improvements over DT and comparable to GPT2. Perhaps some similarities in RL trajectories & language 🤔 3/
Read 7 tweets
Dec 4, 2021
If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism #zen #konmari arxiv.org/abs/2106.06860
We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.
For the table, we followed similar "algorithm" "implementation" separations suggested in our other NeurIPS paper
Read 4 tweets
Oct 23, 2021
Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax #braxlines #jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss…
#braxlines:

#composer: github.com/google/brax/tr…

arxiv: arxiv.org/abs/2110.04686

#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.
Good to see a lot of large-scale, algorithmic deep RL researchers are aligned: "I personally believe that hardware accelerator support is more important, hence choosing Brax."
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(