Shane Gu Profile picture
Researcher @GoogleDeepMind Tokyo. ex: @GoogleAI Brain, @OpenAI. (JP: @shanegJP)
Jerome Ku Profile picture 1 subscribed
Oct 24, 2022 8 tweets 6 min read
(1/8) *new paper* “LLMs can self-improve”
w/ *self-generated CoTs* (“logical dark knowledge”), no GT labels:
- SoTA (74.4%->82.1% GSM8K, 90.0%->94.4% OpenBookQA, 63.4%->67.9% ANLI-A3) by fine-tuning
- SoTA “zero-shot” (GSM8K 70.1% -> 74.2%) by prompting
arxiv.org/abs/2210.11610 Image (2/8) inspiration #1: I like analogies. When @kojima_tks @yusuke_iwasawa_ shared initial “step by step” results, my reaction was its (1) unreal engine trick of NLP, (2) temp trick in distillation arxiv.org/abs/1503.02531 @geoffreyhinton, so we called it “logical dark knowledge”😃 ImageImageImageImage
Oct 1, 2022 7 tweets 5 min read
Attended #TeslaAIDay2022. Engaging yet technical. (streaming link: ) Many memorable moments! 🧵1/ - 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/
Jan 31, 2022 7 tweets 3 min read
Can pre-trained language models be used for offline RL? We look to answer this question in our new work and demonstrate SoTA-level performance on various offline RL benchmarks when adapting pre-trained LMs for RL 🤯

paper: arxiv.org/abs/2201.12122
code: github.com/machelreid/can… 1/ We look at adapting pre-trained language models (e.g. GPT2) and image models (e.g. ImageGPT) for Decision Transformer in offline RL and show consistent improvement in performance over all strong baselines, e.g.. DT, TD3+BC, CQL: 2/
Dec 4, 2021 6 tweets 5 min read
Excited to co-run the @EcoTheoryRL "data-centric RL" workshop @NeurIPSConf! Schedule: sites.google.com/corp/view/ecor…

INCREDIBLE speakers:

1) @ShaneLegg (Co-founder/Chief Scientist of @DeepMind)
2) Joelle Pineau (McGill, FAIR, MILA) 3) Pierre-Yves Oudeyer @pyoudeyer (INRIA)
4) @katjahofmann (@MSFTResearch @MSFTResearchCam)
Dec 4, 2021 4 tweets 3 min read
If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism #zen #konmari arxiv.org/abs/2106.06860 We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.
Oct 23, 2021 4 tweets 4 min read
Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax #braxlines #jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss… #braxlines:

#composer: github.com/google/brax/tr…

arxiv: arxiv.org/abs/2110.04686

#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.