✨ New Paper ✨ on robust optimization to mitigate unspecified spurious features, accepted to #ICLR2023.
We present AGRO, a novel min-max optimization method that jointly finds coherent error-prone groups in training data and minimizes worst expected loss over them.
🧵🔽 1/6
2/ Human evaluation of ARGO groups in popular benchmark datasets shows that they contain well-defined, yet ✨ previously unstudied ✨ spurious correlations. For e.g., blondes wearing hats or sunglasses in CelebA and MNLI entailment examples with antonyms. More examples in paper.
3/ Group distributionally robust optimization (G-DRO) mitigates distributional shifts caused by spurious correlations in the training data by minimizing the worst expected loss over pre-identified groups in the data.
4/ AGRO equips G-DRO with an adversarial slicing model to find a group assignment for training examples that maximizes the worst-case loss over the assignment, finding a group assignment that is maximally informative for G-DRO.
5/ On the WILDS benchmark, AGRO results in 8% higher model performance on average on known worst-groups. AGRO also improves out-of-distribution performance on SST2, QQP, and MS-COCO—datasets where potential spurious correlations are as yet uncharacterized.
Excited to share my internship work at Google Research with @iftenney@MatthewRLamm on Retrieval-guided generation of *semantically diverse* counterfactuals for question answering tasks like Natural Questions.
Paper: arxiv.org/abs/2110.07596
[1/n]
Automatically generating counterfactuals for QA poses unique challenges: need for world knowledge, semantic diversity, and answerability.
[2/n]
We introduce Retrieve-Generate-Filter (RGF) framework: (a) Retrieve-then-read (REALM) for *alternate* answers and passages, (b) counterfactual question generation, (c) question filtering for minimality and type of semantic shift (see Figure).
[3/n]