Tired of waiting 💤 while your model trains? Try skipping points that are already learned, not learnable or not worth learning! Robustly reduces required training steps 🏎 by >10x ! to reach the same accuracy on big web-scraped data
Training on big web-scraped data can take ages 💤 But lots of compute and time is wasted on redundant and noisy points that are already learned, not learnable, or not even worth learning.
What if we just skip these points? Our method—RHO-LOSS—trains in far fewer gradient steps than prior art, boosts accuracy, and speeds up training on 8 datasets, lots of hyperparameters, and 10 architectures (MLPs, CNNs, and BERT).
Existing ideas, like skipping points with low loss, accidentally prioritise noisy and less relevant which are common in real-world data and web-scrapes, but barely help generalization.
That’s where Reducible Holdout Loss Selection (RHO-LOSS) comes in. We select points that most reduce the generalisation loss and show that this objective has a simple and cheap but close approximation (line 7).
This gives an intuitive result: in a precise sense, the optimal points for fast training are learnable, worth learning, and not yet learned.
Check out the paper for:
✨ re-using a single small auxiliary model to accelerate training across multiple architectures
✨ RHO-LOSS as an efficient approximation to optimal selection, derived in the language of probabilistic modelling
✨ why this works so well 😊
As Europe enters a third wave of COVID, policy-makers balance controlling infections with the sweeping socioeconomic costs of interventions. To do so, we must know how effective individual interventions were at controlling COVID. 2/
Many papers estimate the effects of non-pharmaceutical interventions in the first wave, often using a data-driven approach that minimises the number of assumptions made. Here’s the problem … 3/
Work done with great colleagues from 13 research groups, supervised by @yaringal, @yeewhye, Leonid Chindelevitch. Currently in submission.
We manually collected data (now with independent double-entry and over a longer period) on interventions used by 41 countries. Excited to see what else people will do with it.