Reliability is a key challenge in ML. There are now dozens of robust training methods and datasets - how do they compare?

We ran 200+ ImageNet models on 200+ test sets to find out.
modestyachts.github.io/imagenet-testb…

TDLR: Distribution shift is *really* hard, but common patterns emerge.
To organize the 200 distribution shifts, we divide them into two categories: synthetic shifts and natural shifts.

Synthetic shifts are derived from existing images by perturbing them with noise, etc.

Natural shifts are new, unperturbed images from a different distribution.
At a high level, there has been good progress on the synthetic shifts (e.g., ImageNet-C or adversarial examples).

Natural distribution shifts (e.g., ImageNetV2 or ObjectNet), on the other hand, are still much harder.
But how do we measure robustness to begin with?

On many shifts, models with higher accuracy already perform better under distribution shift without any intervention to improve their robustness. So we have to disentangle robustness from the in-distribution accuracy.
To understand if a model is truly more robust (as opposed to being more accurate in-distribution), we introduce “effective robustness” as accuracy beyond the baseline given by standard models. This is best demonstrated graphically:
Looking at effective robustness paints a consistent picture for the natural distribution shifts in our testbed. Current robustness interventions show little to no gains. The only approach that consistently promotes robustness is training on large, diverse datasets.
There is a lot more in our paper, so we built an interactive website to explore all the data we collected:

robustness.imagenetv2.org

We also made it *very* easy for others to integrate datasets and models into our testbed:

github.com/modestyachts/i…
Learn more at our #NeurIPS Spotlight talk TODAY at 7:50pm PST or at our poster at 9pm PST: neurips.cc/virtual/2020/p…

Main project page:
modestyachts.github.io/imagenet-testb…

joint work w/ @achalddave, @Vaishaal, Nicholas Carlini, @beenwrekt, @lschmidt3

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rohan Taori

Rohan Taori Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!