Post

@EcoTheoryRL

@pyoudeyer

@DrewPurves

@Stanford

@yayitsamyzhang

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @shaneguML

Shane Gu

@shaneguML

Oct 24, 2022

(1/8) *new paper* “LLMs can self-improve”
w/ *self-generated CoTs* (“logical dark knowledge”), no GT labels:
- SoTA (74.4%->82.1% GSM8K, 90.0%->94.4% OpenBookQA, 63.4%->67.9% ANLI-A3) by fine-tuning
- SoTA “zero-shot” (GSM8K 70.1% -> 74.2%) by prompting
arxiv.org/abs/2210.11610

@kojima_tks

(2/8) inspiration #1: I like analogies. When @kojima_tks @yusuke_iwasawa_ shared initial “step by step” results, my reaction was its (1) unreal engine trick of NLP, (2) temp trick in distillation arxiv.org/abs/1503.02531 @geoffreyhinton, so we called it “logical dark knowledge”😃

(3/8) inspiration #2: CoT+self-consistency arxiv.org/abs/2203.11171 was used everywhere. Most impressive to me was its calibration. Voting distribution is *very* calibrated: monotonic & even sometimes under-confident! e.g. when it predicts with 70%+ confidence, it’s correct 99%!

Read 8 tweets

Shane Gu

@shaneguML

Oct 1, 2022

Attended #TeslaAIDay2022. Engaging yet technical. (streaming link: ) Many memorable moments! 🧵1/

- 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/

- "generalist" conditional occupancy network: a single "big" network which outputs both voxels and semantics from images. Trained on LARGE dataset from auto labeling. Given where conditional/generative NeRF/OccNets are in academia (arxiv.org/abs/2209.10684), blown away by scale 3/

Read 7 tweets

Shane Gu

@shaneguML

Jan 31, 2022

Can pre-trained language models be used for offline RL? We look to answer this question in our new work and demonstrate SoTA-level performance on various offline RL benchmarks when adapting pre-trained LMs for RL 🤯

paper: arxiv.org/abs/2201.12122
code: github.com/machelreid/can… 1/

We look at adapting pre-trained language models (e.g. GPT2) and image models (e.g. ImageGPT) for Decision Transformer in offline RL and show consistent improvement in performance over all strong baselines, e.g.. DT, TD3+BC, CQL: 2/

Interestingly, we find that vision init does not converge, whereas even a small pre-trained language model ChibiT (where チビ means small or mini in Japanese 😆) on Wiki has improvements over DT and comparable to GPT2. Perhaps some similarities in RL trajectories & language 🤔 3/

Read 7 tweets

Shane Gu

@shaneguML

Dec 4, 2021

@NeurIPSConf

If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism #zen #konmari arxiv.org/abs/2106.06860

We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.

https://twitter.com/shaneguML/status/1466960547640147971?s=20

For the table, we followed similar "algorithm" "implementation" separations suggested in our other NeurIPS paper

https://twitter.com/shaneguML/status/1466960547640147971?s=20

Read 4 tweets

Shane Gu

@shaneguML

Oct 23, 2021

Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax #braxlines #jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss…

https://twitter.com/shaneguML/status/1438696633437286400?s=20

#braxlines:

https://twitter.com/shaneguML/status/1438696633437286400?s=20

#composer: github.com/google/brax/tr…

arxiv: arxiv.org/abs/2110.04686

#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.

Good to see a lot of large-scale, algorithmic deep RL researchers are aligned: "I personally believe that hardware accelerator support is more important, hence choosing Brax."

Read 4 tweets

Share this page!

Enter URL or ID to Unroll

Shane Gu

Try unrolling a thread yourself!

More from @shaneguML

Shane Gu

Shane Gu

Shane Gu

Shane Gu

Shane Gu

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!