Senior Research Scientist, @GoogleDeepMind, ex-🧠. Agents that make decisions. NeurIPS Best Paper (RLiable). Mila, IIT Bombay.
Oct 5, 2022 • 13 tweets • 6 min read
tl;dr: What if we didn’t almost always train reinforcement learning (RL) agents from scratch for research? Our work argues for an alternative approach to research, where we build on prior computation, such as learned policies, network weights.
arxiv.org/abs/2206.01626 (1/N)
Let’s see this workflow on ALE with a trained Nature DQN, which uses RMSProp and we want to try Adam. While we can train DQN (Adam) from scratch, fine-tuning Nature DQN (with a reduced lr) performs substantially better and using Adam matches scratch perf using 40x faster. (2/N)
Aug 31, 2021 • 17 tweets • 7 min read
tl;dr: Our findings call for a change in how we evaluate performance on deep RL benchmarks, for which we present more reliable protocols, easily applicable with *even a handful of runs*, to prevent unreliable results from stagnating the field.
arxiv.org/abs/2108.13264 (1/N)
Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. (2/N)