Background: #conformalprediction is becoming popular for predictive interval generation with a coverage guarantee
Coverage: Predictive interval contains true label with high probability (i.e., predictive confidence intervals are valid)
Assumption: Exchangeable (or, IID) data
Prior work addresses some limitations of standard conformal methods:
Jackknife+: Beneficial compromise of conformal methods’ statistical or computational inefficiencies (Barber et al., 2021)
Weighted conformal: Extends conformal to covariate shift (Tibshirani et al., 2019)
However:
- Weighted conformal does not extend jackknife+ to covariate shift
- Jackknife+ is computationally demanding (requires N retrained models)
- Predictive intervals aren’t often actionable—estimates of error probability can be more useful!
JAWS addresses these challenges:
(Contribution 1) JAW: JAckknife+ with likelihood ratio Weights → extends jackknife+ coverage guarantee to covariate shift
(Contribution 2) JAWA: Sequence of computationally efficient approximations of JAW with higher-order influence functions → asymptotic coverage guarantee under covariate shift
(Contribution 3) Error assessment: General approach to repurposing predictive interval generating methods to error assessment, based on user’s error criterion of interest (for instance, the user wants the chance of false negative or missed diagnosis to be < 5%)
(Contribution 4) Experiments: Demonstrate superior empirical performance of JAWS over conformal and jackknife+ baselines on a variety of biased, real-world datasets
Here, JAW maintains coverage at target level (90%) under covariate shift while jackknife+ does not👇🏽
Thoughts?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Very nice work by @kdpsinghlab & team on evaluating Epic's widely deployed sepsis tool. See thread below on issues to consider when evaluating offline model performance and also common ways in which previous studies have erred.
Also some other really key points: model accuracy and lead time (a key metric for improving sepsis outcomes) has to be computed with model in the background.
This point commonly throws off practitioners who're used to seeing interventional studies.
To maintain reported sensitivity (33%), you have to re-alert to catch patients you might have missed the 1st time. Hence, a time-horizon based NNE is needed.
Have been on the phone with friends and family all over India. They went from not caring and celebrating Holi (the festival of colors) to a lock down in 1 week! The worrisome things I learnt:
1/ One of the slums in Mumbai now has COVID-19; these are very densely packed areas with little public health infrastructure. The challenge is there is no reliable reporting of case loads or deaths in poor areas.
2/ COVID-19 tests cost Rs. 4500 (~$70). They’re starting to make cheaper tests available through private labs, primarily will be used by the wealthy. Either way, it’s too expensive so the government is only testing those who have symptoms & have traveled.
Just resurfacing frm 2wks off over the holidays. Visited South Africa—saw gorgeous 🦒🐘🦁s, birds, seaside towns, & spent time thinking! Highlights from 2018(warning: long thread). It's been a hard yr so both bragging & celebrating. Looking fwd to all that’s in store for 2019! 1/
First, on the research front, we made headway in multiple open directions.
As ML is being deployed in domains like healthcare, education, & recruiting, it’s critical we understand scenarios in which model outputs may be unreliable during deployment. 2/
A key source of unreliability is due to unanticipated shifts in the data distribution between train and deployment environments. Previously, we’d discussed one such type—policy shift—and how these get introduced when deploying a decision support tool: arxiv.org/abs/1703.10651 3/