A short 🧵
One of the most reliable ways to handle distr. shift is to fine-tune on a small amt. of data.
We find that the best layers to fine-tune depends on the *type* of shift!
Compared to fine-tuning the whole network, fine-tuning just one block achieves similar or higher accuracy. ⬇️
Feb 9, 2022 • 7 tweets • 4 min read
What should ML models do when there's a *perfect* correlation between spurious features and labels?
This is hard b/c the problem is fundamentally _underdefined_
DivDis can solve this problem by learning multiple diverse solutions & then disambiguating arxiv.org/abs/2202.03418
🧵
Prior works have made progress on robustness to spurious features but also have important weaknesses:
- They can't handle perfect/complete correlations
- They often need labeled data from the target distr. for hparam tuning
A collab w. the amazing @mike_h_wu@chrispiech & co 🧵 2/ Student feedback is a fundamental problem in scaling education.
Providing good feedback is hard: existing approaches provide canned responses, cryptic error messages, or simply provide the answer.
Apr 1, 2021 • 5 tweets • 3 min read
How can robots generalize to new environments & tasks?
We find that using in-the-wild videos of people can allow learned reward functions to do so!
Paper: arxiv.org/abs/2103.16817
Led by @_anniechen_, @SurajNair_1
🧵(1/5)
To get reward functions that generalize, we train domain-agnostic video discriminators (DVD) with:
* a lot of diverse human data, and
* a narrow & small amount of robot demos
The idea is super simple: predict if two videos are performing the same task or not.
(2/5)
Jul 8, 2020 • 8 tweets • 4 min read
Convolution is an example of structure we build into neural nets. Can we _discover_ convolutions & other symmetries from data?
With M Zhang, H Marklund @abhishekunique7@svlevine
(1/6)
Prior works on dIstributionally-robust optimization (DRO) aim to be _robust_ to distribution shift.
Group DRO aims for robustness to shifts in groups underlying the dataset. (e.g. see arxiv.org/abs/1611.02041)
(2/6)