Gemini 1.5 Pro post-training multilinguality lead @GoogleDeepMind Tokyo/SF. 🇯🇵-born 🇨🇳🇨🇦. ex: @GoogleAI Brain, @OpenAI. (JP: @shanegJP)
Oct 24, 2022 • 8 tweets • 6 min read
(1/8) *new paper* “LLMs can self-improve”
w/ *self-generated CoTs* (“logical dark knowledge”), no GT labels:
- SoTA (74.4%->82.1% GSM8K, 90.0%->94.4% OpenBookQA, 63.4%->67.9% ANLI-A3) by fine-tuning
- SoTA “zero-shot” (GSM8K 70.1% -> 74.2%) by prompting arxiv.org/abs/2210.11610
(2/8) inspiration #1: I like analogies. When @kojima_tks@yusuke_iwasawa_ shared initial “step by step” results, my reaction was its (1) unreal engine trick of NLP, (2) temp trick in distillation arxiv.org/abs/1503.02531@geoffreyhinton, so we called it “logical dark knowledge”😃
Oct 1, 2022 • 7 tweets • 5 min read
Attended #TeslaAIDay2022. Engaging yet technical. (streaming link: ) Many memorable moments! 🧵1/
- 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/
Jan 31, 2022 • 7 tweets • 3 min read
Can pre-trained language models be used for offline RL? We look to answer this question in our new work and demonstrate SoTA-level performance on various offline RL benchmarks when adapting pre-trained LMs for RL 🤯
paper: arxiv.org/abs/2201.12122
code: github.com/machelreid/can… 1/
We look at adapting pre-trained language models (e.g. GPT2) and image models (e.g. ImageGPT) for Decision Transformer in offline RL and show consistent improvement in performance over all strong baselines, e.g.. DT, TD3+BC, CQL: 2/
If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism#zen#konmariarxiv.org/abs/2106.06860
We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.
Oct 23, 2021 • 4 tweets • 4 min read
Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax#braxlines#jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss…#braxlines:
#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.