Tweet

Rohan Taori

8 Dec, 8 tweets, 4 min read

Reliability is a key challenge in ML. There are now dozens of robust training methods and datasets - how do they compare?

We ran 200+ ImageNet models on 200+ test sets to find out.
modestyachts.github.io/imagenet-testb…

TDLR: Distribution shift is *really* hard, but common patterns emerge.

To organize the 200 distribution shifts, we divide them into two categories: synthetic shifts and natural shifts.

Synthetic shifts are derived from existing images by perturbing them with noise, etc.

Natural shifts are new, unperturbed images from a different distribution.

At a high level, there has been good progress on the synthetic shifts (e.g., ImageNet-C or adversarial examples).

Natural distribution shifts (e.g., ImageNetV2 or ObjectNet), on the other hand, are still much harder.

But how do we measure robustness to begin with?

On many shifts, models with higher accuracy already perform better under distribution shift without any intervention to improve their robustness. So we have to disentangle robustness from the in-distribution accuracy.

To understand if a model is truly more robust (as opposed to being more accurate in-distribution), we introduce “effective robustness” as accuracy beyond the baseline given by standard models. This is best demonstrated graphically:

Looking at effective robustness paints a consistent picture for the natural distribution shifts in our testbed. Current robustness interventions show little to no gains. The only approach that consistently promotes robustness is training on large, diverse datasets.

There is a lot more in our paper, so we built an interactive website to explore all the data we collected:

robustness.imagenetv2.org

We also made it *very* easy for others to integrate datasets and models into our testbed:

github.com/modestyachts/i…

@achalddave

Learn more at our #NeurIPS Spotlight talk TODAY at 7:50pm PST or at our poster at 9pm PST: neurips.cc/virtual/2020/p…

Main project page:
modestyachts.github.io/imagenet-testb…

joint work w/ @achalddave, @Vaishaal, Nicholas Carlini, @beenwrekt, @lschmidt3

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Rohan Taori

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?