If overwhelmed by # of papers in *offline* RL, check out our @NeurIPSConf Spotlight with Scott Fujimoto: we show how few lines change to TD3 (TD3+BC) can be competitive with SoTA algorithms, halving training time. Inspired by #minimalism#zen#konmariarxiv.org/abs/2106.06860
We propose "BC as a regularizer", which adds negligible compute cost to original TD3 objective, but makes it quite performative on offline RL.
For the table, we followed similar "algorithm" "implementation" separations suggested in our other NeurIPS paper
Lastly, we appreciate constructive feedback from reviewers and suggesting this "one git-pull"-size paper for Spotlight at @NeurIPSConf! The community could benefit from simple ideas.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Toy MuJoCo + Box2d envs in OpenAI Gym are moving to #brax! 100x GPU/TPU speedup + purely pythonic + jax/pytorch-enabled ready to be unleashed! An exciting news for #brax#braxlines#jax teams. Also check out #composer, where I am adding more demos github.com/openai/gym/iss…
#brax still cannot (and probably won't ever) match the full specs with mujoco/pybullet. But esp with open-sourcing plans of mujoco, excited to see where could be synergies.
Good to see a lot of large-scale, algorithmic deep RL researchers are aligned: "I personally believe that hardware accelerator support is more important, hence choosing Brax."