Latest Twitter Threads by @shaneguML on Thread Reader App

Oct 24, 2022 • 8 tweets • 6 min read

(1/8) *new paper* “LLMs can self-improve”
w/ *self-generated CoTs* (“logical dark knowledge”), no GT labels:
- SoTA (74.4%->82.1% GSM8K, 90.0%->94.4% OpenBookQA, 63.4%->67.9% ANLI-A3) by fine-tuning
- SoTA “zero-shot” (GSM8K 70.1% -> 74.2%) by prompting
arxiv.org/abs/2210.11610

(2/8) inspiration #1: I like analogies. When @kojima_tks @yusuke_iwasawa_ shared initial “step by step” results, my reaction was its (1) unreal engine trick of NLP, (2) temp trick in distillation arxiv.org/abs/1503.02531 @geoffreyhinton, so we called it “logical dark knowledge”😃

Oct 1, 2022 • 7 tweets • 5 min read

Attended #TeslaAIDay2022. Engaging yet technical. (streaming link: ) Many memorable moments! 🧵1/

- 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/

Jan 31, 2022 • 7 tweets • 3 min read

Can pre-trained language models be used for offline RL? We look to answer this question in our new work and demonstrate SoTA-level performance on various offline RL benchmarks when adapting pre-trained LMs for RL 🤯

paper: arxiv.org/abs/2201.12122
code: github.com/machelreid/can… 1/

We look at adapting pre-trained language models (e.g. GPT2) and image models (e.g. ImageGPT) for Decision Transformer in offline RL and show consistent improvement in performance over all strong baselines, e.g.. DT, TD3+BC, CQL: 2/

Dec 4, 2021 • 6 tweets • 5 min read

Excited to co-run the @EcoTheoryRL "data-centric RL" workshop @NeurIPSConf! Schedule: sites.google.com/corp/view/ecor…

INCREDIBLE speakers:

1) @ShaneLegg (Co-founder/Chief Scientist of @DeepMind)
2) Joelle Pineau (McGill, FAIR, MILA)

3) Pierre-Yves Oudeyer @pyoudeyer (INRIA)
4) @katjahofmann (@MSFTResearch @MSFTResearchCam)

Dec 4, 2021 • 4 tweets • 3 min read

If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism #zen #konmari arxiv.org/abs/2106.06860

We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.

Oct 23, 2021 • 4 tweets • 4 min read

Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax #braxlines #jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss…

#braxlines:

https://twitter.com/shaneguML/status/1438696633437286400?s=20

#composer: github.com/google/brax/tr…

arxiv: arxiv.org/abs/2110.04686

#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.

Share this page!

Enter URL or ID to Unroll