Tweet

Sören Mindermann

Jun 16 • 11 tweets • 5 min read

Tired of waiting 💤 while your model trains? Try skipping points that are already learned, not learnable or not worth learning! Robustly reduces required training steps 🏎 by >10x ! to reach the same accuracy on big web-scraped data

📜ICML 2022 paper: arxiv.org/abs/2206.07137

Training on big web-scraped data can take ages 💤 But lots of compute and time is wasted on redundant and noisy points that are already learned, not learnable, or not even worth learning.

What if we just skip these points? Our method—RHO-LOSS—trains in far fewer gradient steps than prior art, boosts accuracy, and speeds up training on 8 datasets, lots of hyperparameters, and 10 architectures (MLPs, CNNs, and BERT).

Existing ideas, like skipping points with low loss, accidentally prioritise noisy and less relevant which are common in real-world data and web-scrapes, but barely help generalization.

That’s where Reducible Holdout Loss Selection (RHO-LOSS) comes in. We select points that most reduce the generalisation loss and show that this objective has a simple and cheap but close approximation (line 7).

This gives an intuitive result: in a precise sense, the optimal points for fast training are learnable, worth learning, and not yet learned.

📜#icml2022 paper: arxiv.org/pdf/2206.07137…

Check out the paper for:
✨ re-using a single small auxiliary model to accelerate training across multiple architectures
✨ RHO-LOSS as an efficient approximation to optimal selection, derived in the language of probabilistic modelling
✨ why this works so well 😊

@oatml

Work done at @oatml @CohereAI with great collaborators @JanMBrauner, @MrinankSharma @mtrazzak, @BlackHC, Winnie Xu, Ben Höltgen, @adrien_morisot, @aidangomezzz, @seb_far, @yaringal

⌨️Code: github.com/OATML/RHO-Loss

@OATML_Oxford

Correction to the lap of honour: @OATML_Oxford is where it happened 😊

@yieldthought

Might be of interest to @yieldthought (thanks for your kind words on the workshop paper!)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @sorenmind

Sören Mindermann

@sorenmind

Mar 27, 2021

@Oxford

“Understanding the effectiveness of government interventions in Europe’s second wave of COVID-19” -- a result of a new collaboration between @Oxford, @Imperial, @FHIOxford, @OATML_Oxford, @uni_copenhagen, @LSHTM, @Cambridge_Uni, @TheCrick... 1/

➡️Paper: medrxiv.org/content/10.110…

As Europe enters a third wave of COVID, policy-makers balance controlling infections with the sweeping socioeconomic costs of interventions. To do so, we must know how effective individual interventions were at controlling COVID. 2/

Many papers estimate the effects of non-pharmaceutical interventions in the first wave, often using a data-driven approach that minimises the number of assumptions made. Here’s the problem … 3/

Read 17 tweets

Sören Mindermann

@sorenmind

Jul 27, 2020

Excited to share a major update on how effective 8 interventions have been against COVID-19 transmission in 41 countries.

Paper: medrxiv.org/content/10.110…

@yaringal

Work done with great colleagues from 13 research groups, supervised by @yaringal, @yeewhye, Leonid Chindelevitch. Currently in submission.

We manually collected data (now with independent double-entry and over a longer period) on interventions used by 41 countries. Excited to see what else people will do with it.

Data/code: github.com/epidemics/COVI…

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Sören Mindermann

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @sorenmind

Sören Mindermann

Sören Mindermann

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?