New paper alert🚨#statstwitter

Conformal inference, often framed as a technique to generate prediction intervals, is also a tool for out-of-distribution detection. We studied marginal/conditional conformal p-values for multiple testing with marginal/conditional error control 1/n
We consider the setting where a dataset of “inliers” is available. Existing outlier detection algorithms often output a “score” for each testing point indicating how regular it is.

But how to choose a cutoff to get guaranteed statistical error (e.g., type-I error) control? 2/n Image
For a single data point X, it can be formulated as a hypothesis testing problem with H0: X~P, where P is the (unknown) distribution of inliers X_1, …, X_n. Intuitively, H0 should be rejected if score(X) is too small compared to {score(X_1), …, score(X_n)}. 3/n
A conformal p-value for H0 is the rank of score(X) among the set {score(X), score(X_1), …, score(X_n)}, normalized by (n+1). For ANY scoring algorithm, the conformal p-value is uniformly distributed on {1/(n+1), 2/(n+1), …, (n+1)/(n+1)} and thus valid under the null. 4/n Image
What happens if multiple testing points are to be tested?

For example, does the Benjamini-Hochberg procedure control the false discovery rate (FDR) for outlier detection?

It is not trivial at all since conformal p-values are dependent through {score(X_1), …, score(X_n)}. 5/n
Surprisingly, we prove that the BH procedure applied on conformal p-values controls FDR.

This is proved by showing that the conformal p-values satisfy a delicate property called PRDS, introduced in the pioneering work by Benjamini & Yekutieli (2001) projecteuclid.org/journals/annal… 6/n
Another interesting finding: though the celebrated Storey-BH procedure fails to control FDR with PRDS p-values in general,
we prove that it controls FDR without any adjustment when applied to conformal p-values. It is based on a novel generic FDR bound for Storey-BH. 7/n Image
Now let’s turn to another important question: are type-I error and FDR defined above sensible error metrics?

The above ones marginalize over the inlier dataset. When it is expensive to obtain pure inliers, it would be safer to control the them conditional on the data. 8/n
To achieve conditional error control, the first step is to get conditional valid conformal p-values (defined in the figure). Unfortunately, marginal conformal p-values are not conditionally valid and adjustment is needed. 9/n Image
We found an intriguing connection between conditionally valid conformal p-values and confidence bands of CDFs. In particular, unexpectedly, we found that the generalized Simes’ inequality (used for k-FWER control) yields a suitable adjustment formula for our purpose. 10/n Image
Check out our paper for more details if you’re interested in outlier/out-of-distribution detection with conformal p-values!

arxiv.org/abs/2104.08279

11/n

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lihua Lei

Lihua Lei Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lihua_lei_stat

Mar 2
🚨Job talk thread🚨

Title: What Can *Conformal Inference* Offer to Statistics?

Slides: lihualei71.github.io/Job_Talk_Lihua…

Main points:
(1) Conformal Inference can be made applicable in many #stats problems
(2) There are lots of misconceptions about Conformal Inference
(3) Try it!

1/n
Conformal Inference was designed for generating prediction intervals with guaranteed coverage in standard #ML problems.

Nevertheless, it can be modified to be applicable in

✔️Causal inference
✔️Survival analysis
✔️Election night model
✔️Outlier detection
✔️Risk calibration

2/n
Misconceptions about conformal inference:

❌ Conformal intervals only have marginal coverage and tend to be wide
✔️ Conformal intervals w/ proper conformity scores achieve conditional coverage & efficiency (short length) if the model is correctly specified

3/n
Read 6 tweets
Feb 14
A thread on BUE & BLUE🔥

Gauss-Markov condition:
1) y=Xβ+ε
2) E[ε|X]=0
3) Cov(ε|X)=σ^2Σ

Standard GM:
4) Σ=I

The GM thm shows that OLS/GLS is BL(inear)UE.

Hansen (’20) shows it holds for all unbiased est (inc. nonlinear) w/ an elegant proof (tilted density + Cramer-Rao)

1/n
Call F_2 & F_2^0 the classes of dists satisfying 1-3 & 1-4.
Hansen proves if \hat{β} is unbiased under F_2 for all Σ, then GLS (OLS) is BUE under F_2 (F_2^0).

An intriguing Q. raised by @jmwooldridge & @CavaliereGiu is

Does there exist nonlinear unbiased est under F_2?

2/n Image
Turns out no nonlinear est can be unbiased under F_2!

This can be proved using a deep result by Koopmann (’82) and restated in Gnot et al. (’92)

tandfonline.com/doi/abs/10.108…

Roughly speaking, an estimator that is unbiased under F_2 w/ a fixed Σ must be linear+quadratic.

3/n Image
Read 6 tweets
Apr 12, 2021
Check out our new work on conformalized survival analysis w/@RenZhimei and Emmanuel Candès: arxiv.org/abs/2103.09763 Our method can wrap around any survival predictive algorithms and produce calibrated covariate-dependent lower predictive bounds (LPBs) on survival times. 1/n Image
Survival predictive analysis is complicated by *censoring*, which partially masks the outcome. For example, the actual survival time is unknown for units whose event (e.g., death) has yet to happen. A common type is called the “end-of-study” censoring, illustrated below. 2/n
Under two standard assumptions, our conformalize LPBs achieve

(a) marginal coverage in finite samples if P(C|X) is known

(b) approx. conditional coverage if P(T|X) is well estimated

(c) approx. marginal coverage if either P(C|X) or P(T|X) is well estimated (doubly robust)

3/n Image
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(