Conformal inference, often framed as a technique to generate prediction intervals, is also a tool for out-of-distribution detection. We studied marginal/conditional conformal p-values for multiple testing with marginal/conditional error control 1/n
We consider the setting where a dataset of “inliers” is available. Existing outlier detection algorithms often output a “score” for each testing point indicating how regular it is.
But how to choose a cutoff to get guaranteed statistical error (e.g., type-I error) control? 2/n
For a single data point X, it can be formulated as a hypothesis testing problem with H0: X~P, where P is the (unknown) distribution of inliers X_1, …, X_n. Intuitively, H0 should be rejected if score(X) is too small compared to {score(X_1), …, score(X_n)}. 3/n
A conformal p-value for H0 is the rank of score(X) among the set {score(X), score(X_1), …, score(X_n)}, normalized by (n+1). For ANY scoring algorithm, the conformal p-value is uniformly distributed on {1/(n+1), 2/(n+1), …, (n+1)/(n+1)} and thus valid under the null. 4/n
What happens if multiple testing points are to be tested?
For example, does the Benjamini-Hochberg procedure control the false discovery rate (FDR) for outlier detection?
It is not trivial at all since conformal p-values are dependent through {score(X_1), …, score(X_n)}. 5/n
Surprisingly, we prove that the BH procedure applied on conformal p-values controls FDR.
This is proved by showing that the conformal p-values satisfy a delicate property called PRDS, introduced in the pioneering work by Benjamini & Yekutieli (2001) projecteuclid.org/journals/annal… 6/n
Another interesting finding: though the celebrated Storey-BH procedure fails to control FDR with PRDS p-values in general,
we prove that it controls FDR without any adjustment when applied to conformal p-values. It is based on a novel generic FDR bound for Storey-BH. 7/n
Now let’s turn to another important question: are type-I error and FDR defined above sensible error metrics?
The above ones marginalize over the inlier dataset. When it is expensive to obtain pure inliers, it would be safer to control the them conditional on the data. 8/n
To achieve conditional error control, the first step is to get conditional valid conformal p-values (defined in the figure). Unfortunately, marginal conformal p-values are not conditionally valid and adjustment is needed. 9/n
We found an intriguing connection between conditionally valid conformal p-values and confidence bands of CDFs. In particular, unexpectedly, we found that the generalized Simes’ inequality (used for k-FWER control) yields a suitable adjustment formula for our purpose. 10/n
Check out our paper for more details if you’re interested in outlier/out-of-distribution detection with conformal p-values!
Main points: (1) Conformal Inference can be made applicable in many #stats problems (2) There are lots of misconceptions about Conformal Inference (3) Try it!
1/n
Conformal Inference was designed for generating prediction intervals with guaranteed coverage in standard #ML problems.
Nevertheless, it can be modified to be applicable in
✔️Causal inference
✔️Survival analysis
✔️Election night model
✔️Outlier detection
✔️Risk calibration
2/n
Misconceptions about conformal inference:
❌ Conformal intervals only have marginal coverage and tend to be wide
✔️ Conformal intervals w/ proper conformity scores achieve conditional coverage & efficiency (short length) if the model is correctly specified
Call F_2 & F_2^0 the classes of dists satisfying 1-3 & 1-4.
Hansen proves if \hat{β} is unbiased under F_2 for all Σ, then GLS (OLS) is BUE under F_2 (F_2^0).
Check out our new work on conformalized survival analysis w/@RenZhimei and Emmanuel Candès: arxiv.org/abs/2103.09763 Our method can wrap around any survival predictive algorithms and produce calibrated covariate-dependent lower predictive bounds (LPBs) on survival times. 1/n
Survival predictive analysis is complicated by *censoring*, which partially masks the outcome. For example, the actual survival time is unknown for units whose event (e.g., death) has yet to happen. A common type is called the “end-of-study” censoring, illustrated below. 2/n
Under two standard assumptions, our conformalize LPBs achieve
(a) marginal coverage in finite samples if P(C|X) is known
(b) approx. conditional coverage if P(T|X) is well estimated
(c) approx. marginal coverage if either P(C|X) or P(T|X) is well estimated (doubly robust)