Suchi Saria Profile picture
Nov 30 9 tweets 4 min read
Machine learning predictive uncertainty estimates are often unreliable—data shift makes things worse!

How can you audit the uncertainty of an ML prediction, even with biased data?

A 🧵 w/ @DrewPrinster on the JAWS approach in #NeurIPS2022 paper w/ fab @anqi_liu33 @DrewPrinster
Why generate uncertainty intervals and enable real time audits?

Build user trust arxiv.org/pdf/1805.11783… proceedings.mlr.press/v89/schulam19a…
In decision support apps, reduce false alerts pubmed.ncbi.nlm.nih.gov/28841550/
Enable safety assessment inhttps://www.nejm.org/doi/full/10.1056/NEJMc2104626
Background: #conformalprediction is becoming popular for predictive interval generation with a coverage guarantee

Coverage: Predictive interval contains true label with high probability (i.e., predictive confidence intervals are valid)

Assumption: Exchangeable (or, IID) data
Prior work addresses some limitations of standard conformal methods:

Jackknife+: Beneficial compromise of conformal methods’ statistical or computational inefficiencies (Barber et al., 2021)

Weighted conformal: Extends conformal to covariate shift (Tibshirani et al., 2019)
However:

- Weighted conformal does not extend jackknife+ to covariate shift
- Jackknife+ is computationally demanding (requires N retrained models)
- Predictive intervals aren’t often actionable—estimates of error probability can be more useful!

JAWS addresses these challenges:
(Contribution 1) JAW: JAckknife+ with likelihood ratio Weights → extends jackknife+ coverage guarantee to covariate shift
(Contribution 2) JAWA: Sequence of computationally efficient approximations of JAW with higher-order influence functions → asymptotic coverage guarantee under covariate shift
(Contribution 3) Error assessment: General approach to repurposing predictive interval generating methods to error assessment, based on user’s error criterion of interest (for instance, the user wants the chance of false negative or missed diagnosis to be < 5%)
(Contribution 4) Experiments: Demonstrate superior empirical performance of JAWS over conformal and jackknife+ baselines on a variety of biased, real-world datasets

Here, JAW maintains coverage at target level (90%) under covariate shift while jackknife+ does not👇🏽

Thoughts?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Suchi Saria

Suchi Saria Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @suchisaria

Jun 24, 2021
Very nice work by @kdpsinghlab & team on evaluating Epic's widely deployed sepsis tool. See thread below on issues to consider when evaluating offline model performance and also common ways in which previous studies have erred.
Also some other really key points: model accuracy and lead time (a key metric for improving sepsis outcomes) has to be computed with model in the background.

This point commonly throws off practitioners who're used to seeing interventional studies.
How should we interpret these NNE results?

To maintain reported sensitivity (33%), you have to re-alert to catch patients you might have missed the 1st time. Hence, a time-horizon based NNE is needed.

So, how frequently to re-alert?
Read 11 tweets
Mar 29, 2020
Have been on the phone with friends and family all over India. They went from not caring and celebrating Holi (the festival of colors) to a lock down in 1 week! The worrisome things I learnt:
1/ One of the slums in Mumbai now has COVID-19; these are very densely packed areas with little public health infrastructure. The challenge is there is no reliable reporting of case loads or deaths in poor areas.
2/ COVID-19 tests cost Rs. 4500 (~$70). They’re starting to make cheaper tests available through private labs, primarily will be used by the wealthy. Either way, it’s too expensive so the government is only testing those who have symptoms & have traveled.
Read 7 tweets
Jan 7, 2019
Just resurfacing frm 2wks off over the holidays. Visited South Africa—saw gorgeous 🦒🐘🦁s, birds, seaside towns, & spent time thinking! Highlights from 2018(warning: long thread). It's been a hard yr so both bragging & celebrating. Looking fwd to all that’s in store for 2019! 1/
First, on the research front, we made headway in multiple open directions.

As ML is being deployed in domains like healthcare, education, & recruiting, it’s critical we understand scenarios in which model outputs may be unreliable during deployment. 2/
A key source of unreliability is due to unanticipated shifts in the data distribution between train and deployment environments. Previously, we’d discussed one such type—policy shift—and how these get introduced when deploying a decision support tool: arxiv.org/abs/1703.10651 3/
Read 23 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(