- 8 months or so to build humanoids from scratch: Two iterations. Far from Boston Dynamics in locomotion, and far from human bi-dexterous manipulation, but given 8-month window, the results were amazing. Nicely leveraged as much of self-driving pipeline + Dojo compute. 2/
- "generalist" conditional occupancy network: a single "big" network which outputs both voxels and semantics from images. Trained on LARGE dataset from auto labeling. Given where conditional/generative NeRF/OccNets are in academia (arxiv.org/abs/2209.10684), blown away by scale 3/
- lane prediction as image-to-text: used decoder-only Transformer for output "lanes" with a custom language. Best example I've seen of using discrete semantic tokens to describe rich continuous spatial (2D) information. Likely to generalize to 3D and other robot tasks. 4/
- 2 weeks to reconstruct "alterable" San Francisco in simulation: images below are fully generated by Unreal Engine. Everything (road texture, lane semantics, human/cars, weather, etc) is modifiable to generate additional training data. Only took two weeks to input SF. 5/
- Dojo (Tesla custom compute hardware): 50% of total Tesla compute is on auto labeling + OccNet. Dojo beats A100 GPUs with various problem-specific optimizations. PyTorch supported. Runs #stablediffusion. Probably the most exciting result of the night. 6/
Can pre-trained language models be used for offline RL? We look to answer this question in our new work and demonstrate SoTA-level performance on various offline RL benchmarks when adapting pre-trained LMs for RL 🤯
We look at adapting pre-trained language models (e.g. GPT2) and image models (e.g. ImageGPT) for Decision Transformer in offline RL and show consistent improvement in performance over all strong baselines, e.g.. DT, TD3+BC, CQL: 2/
Interestingly, we find that vision init does not converge, whereas even a small pre-trained language model ChibiT (where チビ means small or mini in Japanese 😆) on Wiki has improvements over DT and comparable to GPT2. Perhaps some similarities in RL trajectories & language 🤔 3/
If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism#zen#konmariarxiv.org/abs/2106.06860
We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.
For the table, we followed similar "algorithm" "implementation" separations suggested in our other NeurIPS paper
Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax#braxlines#jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss…
#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.
Good to see a lot of large-scale, algorithmic deep RL researchers are aligned: "I personally believe that hardware accelerator support is more important, hence choosing Brax."