Spurious features are a major issue for deep learning. Our new #NeurIPS2022 paper w/ @pol_kirichenko, @gruver_nate and @andrewgwils explores the representations trained on data with spurious features with many surprising findings, and SOTA results.

arxiv.org/abs/2210.11369
🧵1/6 Image
We use Deep Feature Reweighting (DFR) to evaluate feature representations: retrain the last layer of the model on group-balanced validation data. DFR worst group accuracy (WGA) tells us how much information about the core features is learned.



2/6
While group robustness methods such as group DRO can improve WGA a lot, they don’t typically improve the features! With DFR, we recover the same performance for ERM and Group DRO. The improvement in these methods comes from the last layer, not features!

3/6 Image
Better models learn the core feature better: in-distribution accuracy is linearly correlated with the DFR WGA. We don’t find qualitative differences between different types of architectures, such as CNNs and vision transformers: they all fall on the same line.

4/6 Image
ImageNet pretraining (supervised or contrastive) has a major effect on the features, even on non-natural image datasets such as chest X-rays. With strong pretrained models, we achieve SOTA WGA on Waterbirds (97%) , CelebA (92%) and FMOW (50%) with ERM features.

5/6 Image
Many more insights and results in the paper!

Paper: arxiv.org/abs/2210.11369
Code: github.com/izmailovpavel/…
NeurIPS event: neurips.cc/virtual/2022/p…

6/6

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pavel Izmailov

Pavel Izmailov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Pavel_Izmailov

Apr 30, 2021
We run HMC on hundreds of TPU devices for millions of training epochs to provide our best approximation of the true Bayesian neural networks! (1) BNNs do better than deep ensembles (2) no cold posteriors effect but (3) BNNs are terrible under data corruption, and much more! 🧵
First, we find that BNNs at temperature 1 with regular Gaussian priors are actually quite good, outperforming deep ensembles on both accuracy and likelihood!
In fact, tempering even hurts the performance in some cases, with the best performance achieved at temperature 1. What is the main difference with arxiv.org/abs/2002.02405? (1) We turn data augmentation off and (2) we use a very high fidelity inference procedure.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(