Happy to share that our paper “On Calibration and Out-of-domain Generalization” is accepted to #NeurIPS2021!

Congratulations to the wonderful students who came up with the idea and led the work on this paper:
@wald_yoav @amir_feder @d_greenfeld

arxiv.org/abs/2102.10395

1/15
tl;dr: Making a classifier calibrated over multiple training domains is an easy and powerful way for better generalization to unseen domains
2/15
Say you have patient data from several hospitals which differ in their demographics, imaging machines etc, and you want to learn a classifier that generalizes to new unseen hospitals. This is the problem of out-of-domain (OOD) generalization, ubiquitous in NLP, vision & more
3/15
This fig is based on Camelyon17, a dataset of pathology slides from multiple hospitals.
Naive/Robust/CLOvE are methods to re-calibrate trained classifiers.
For each model, reducing in-domain avg. calibration error (x-axis, ECE) also improves OOD accuracy (y-axis).
How come?
4/15
Generally there are multiple different ways to formalize the goals of OOD generalization, e.g. see this recent thread



5/15
We formalize the problem using the causal graph below, assuming that at test time we see data from a previously unobserved domain E=e.
Important: we assume we’re *not* told which of the features are causal, anti-causal spurious, or anti-causal non-spurious
(ac=anti-causal)
6/15
Any model using the ac-spurious features from the graph above is carrying the risk of making arbitrarily large mistakes if the domain (E) changes. Thus, bounding max risk means we must avoid these features

7/15
Speaking more generally, we (and others) claim a good property for a classifier f(X) expected to generalize to new domains E is:
Y⫫E | f(X)

"Y is independent of E conditioned on f(X)"

8/15
Optimizing for Y⫫E | f(X) looks a bit nasty, as cond. indep. is often hard + we’re conditioning on a continuous variable, which is ALSO the target of our optimization problem.
We show this problem is almost equivalent to a more approachable 𝘤𝘢𝘭𝘪𝘣𝘳𝘢𝘵𝘪𝘰𝘯 problem
9/15
Reminder:
For binary Y and classifier f(X), f is calibrated if
𝔼[Y | f(X)] = f(X)
We add that this should happen for every training domain:
𝔼[Y | f(X),E=eᵢ] = f(X) for all train eᵢ

10/15
In linear-Gaussian settings we prove that if a model is simultaneously calibrated over diverse domains, it is guaranteed to be free of spurious correlations and generalize well OOD

11/15
We further prove that multi-domain calibration leads to learning the correct invariances even in cases where IRM(v1) fails to do so
(figure inspired by Kamath et al. AISTATS 2021 paper arxiv.org/abs/2101.01134)
12/15
Our theory shows multi-domain calibration works in linear-Gaussian settings with infinite samples.
We were gratified to see that multi-domain calibration also works in practical, finite-sample, not-at-all-linear settings, getting great results on the WILDS benchmark
13/15
We experimented with successively more sophisticated ways to encourage multi-domain calibration: model selection, post-processing with a robust version of isotonic regression, and a new objective we call CLOvE based on work by Kumar et al (ICML 2018) proceedings.mlr.press/v80/kumar18a.h…
14/15
Many more results in the paper - main takeaway: make your classifier calibrated across training domains, and get better out-of-domain generalization. It’s easy :)
Still many open q's, both theory and practice - looking forward to discussing them here and at #NeurIPS2021
15/15

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Uri Shalit

Uri Shalit Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ShalitUri

6 Jul
You may have seen that the Israeli MoH claims that Pfizer vaccine efficacy has dropped to 64% in Israel, concurrent with the rise of the Delta variant.
The MoH has now published more details on the methodology. Briefly: negative binomial regression controlling for week and age.
In my opinion this leaves *a lot* of potential confounders unaccounted for. Vaccination rates in Israel have converged to 70-90%, with older ppl generally more vaccinated.
There are many potential differences between vaccinated and unvaccinated people. There are further differences in their inclination and formal incentives for being tested for COVID. All of these are not accounted for in the analysis.
Read 9 tweets
31 Jan
In recent days two new pieces of evidence have come out of Israel about the effect of the vaccination drive, showing good news about both individual-level and national level effects.
1/7
One is a study from Maccabi HMO (link further in the thread).
The other is one I’m a part of w/ @H_Rossman @GorfineMalka and @segal_eran , and is explained in the thread here:

2/7
A while ago I discussed the distinction between vaccines' effects on an individual level vs. the national level.
Even if vaccines protect individuals, that's not enough to protect an entire community, due to uneven distribution and uneven effectiveness

3/7
Read 12 tweets
20 Jan
I’m seeing many discussions about the evidence for vaccine effectiveness in Israel. This is a thread with my thoughts on what we know and don’t know at this point. First some vaccination statistics:

1/18
Israel has been vaccinating at a fast pace - by today 78% of people aged 60+ have received at least 1 dose, 58% of 60+ are >14 days from their 1st dose, and 30% already received their 2nd dose.

2/18
We are all waiting to see how the vaccine effect plays out in a real-world, country-wide setting.
When considering vaccine effectiveness we should separate between two related aspects:
3/18
Read 19 tweets
18 Mar 20
We’ve noticed something that looks like a potentially important data-entry problem regarding comorbidities in the “big” China CDC report on the Epidemiological Characteristics of COVID19. This report uses data from 72,314 patient records. (1/6)
The China CDC report has the most widely used numbers I’ve seen for age and sex fatality rates. It was reported in brief form in JAMA, and fully in a China CDC weekly (link below, it loads slowly sometimes).
weekly.chinacdc.cn/en/article/id/…

(2/6)
See numbers here.
From the report:
“† The comorbid condition variable, only includes a total of 20,812 patients and 504 deaths and these values were used to calculate percentages in the confirmed cases and deaths columns.”
(3/6)
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(