Spurious features are a major issue for deep learning. Our new #NeurIPS2022 paper w/ @pol_kirichenko, @gruver_nate and @andrewgwils explores the representations trained on data with spurious features with many surprising findings, and SOTA results.
arxiv.org/abs/2210.11369
🧵1/6
We use Deep Feature Reweighting (DFR) to evaluate feature representations: retrain the last layer of the model on group-balanced validation data. DFR worst group accuracy (WGA) tells us how much information about the core features is learned.
We run HMC on hundreds of TPU devices for millions of training epochs to provide our best approximation of the true Bayesian neural networks! (1) BNNs do better than deep ensembles (2) no cold posteriors effect but (3) BNNs are terrible under data corruption, and much more! 🧵
First, we find that BNNs at temperature 1 with regular Gaussian priors are actually quite good, outperforming deep ensembles on both accuracy and likelihood!