Machine learning sucks at uncertainty quantification.

But there is a solution that almost sounds too good to be true:

conformal prediction

• works for any black box model
• requires few lines of code
• is fast
• comes with statistical guarantees

A thread 🧵
Conformal prediction is a method for uncertainty quantification of machine learning models.

The method takes a heuristic uncertainty score and turns it into a rigorous one.
What confused me first about conformal prediction: It's not 1 algorithm, but a general recipe:

• Split data: training and calibration
• Train model
• Calculate "heuristic" scores on calibration data
• Calibrate score
• Use calibrated scoring rule on new data
The recipe enables many use cases:

• Prediction sets in multi-class problems
• Calibrate classification scores so they can be interpreted as probabilities
• Fix coverage of quantile regression
• Conformal predictive distributions for regression
• ...
How to get started with conformal prediction:

• Tutorial paper: arxiv.org/abs/2107.07511
• Awesome list of conformal prediction approaches: github.com/valeman/awesom…
• scikit-ready Python package: github.com/scikit-learn-c…
• R package: github.com/ryantibs/confo…
Subscribe to my newsletter Mindful Modeler to learn more about conformal prediction and other techniques to make machine learning better.

mindfulmodeler.substack.com

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Christoph Molnar

Christoph Molnar Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ChristophMolnar

Sep 20
Interpretable machine learning is a mishmash of different methods.
I use mental models to understand how interpretation methods work.

My favorite mental model is like an x-ray view that reveals the core of any interpretation method: Functional decomposition.

A thread🧵
A prediction model is a function f that maps from p features to 1 output.

Interpretation often means breaking f down into lower dimensional parts. Partial dependence plots reduce f to 1 feature through marginalization.
It's immensely helpful to view prediction function f as a decomposition.

We can decompose f into lower dimensional functions. Let's say f has 3 features, then:

f = f0 + f1 + f2 + f3 + f12 + f23 + f13 + f123

That's the formula through which I view all interpretation methods. Image
Read 9 tweets
Sep 19
Supervised learning "only" gives you a prediction function.

But with the right tools, you'll get a lot more:

• Uncertainty quantification
• Causality
• Interpretability
• Analysis of variance
• ...

And the best news: tools in this thread work for any black box model

👇
Uncertainty quantification

Conformal prediction turns "weak" uncertainty scores into rigorous prediction intervals.

For example:

• class probabilities -> classification sets
• quantile regression -> conformalized quantile regression

arxiv.org/abs/2107.07511
Causality

Orthogonal/double machine learning brings causal inference to supervised learning. You can estimate treatment effects by training two models (one for treatment, one for control).

econml.azurewebsites.net/spec/estimatio…
Read 8 tweets
Sep 15
Bayesians versus Frequentists is an ancient debate.
But have you heard of likelihoodism?

🧵 A thread on likelihoodism, why no one uses it, and how it helps to understand the Bayesian versus Frequentist debate better.
Likelihoodists honor the likelihood function above all else.
• They reject prior probabilities. That's a big middle finger to the Bayesian approach.
• Evidence from the data must only come through the likelihood. That's why they reject frequentist inference.
I gotta explain the second point. It's not intuitive how frequentist modeling violates this "likelihood principle".

In other words: in frequentist inference, information from the data influences the modeling, but is not part of the likelihood. Nani?? How can that be?
Read 14 tweets
Sep 15
Most ML interpretation methods have a common enemy:

Correlated features.

They ruin interpretation both on a technical and a philosophical level.

Why correlation is problematic, how to patch it, and why we have no cure.

A thread 🧵
Correlated features are the rule, not the exception.

• Predicting bike rentals? Season and temperature are correlated.
• Credit scoring? Income correlates with age, job, ...
• Diagnosing patients? Blood values are correlated, like multiple markers of inflammation, ...
We'll use 3 points of view to understand the effect of correlation on interpretability:

• Extrapolation
• Entanglement
• Latent variables

Note: correlation here includes more general dependencies, not only linear correlation.
Read 19 tweets
Sep 13
Supervised machine learning models are deployed everywhere.

It's an open secret that all models have a huge problem:

Performative prediction - when predictions change future outcomes.

How to spot and handle this problem: A thread 🧵
Once a machine learning model is deployed in the wild, the predictions will affect its environment. If not, what would be the point of the model? That's why almost every deployed model is affected by performative prediction. Once you get the concept, you see it everywhere.
The model changes the environment ...

So what?

The change often affects model performance. And so endangers the product, the people involved and your wallet.
Read 14 tweets
Sep 9
It's overwhelming to keep up with research on interpretable machine learning. I say that as the author of the Interpretable Machine Learning book. 😅

I use these 3 questions to quickly understand new interpretation methods:
Questions for a quick assessment of an ML interpretation method:

• Are the explanations global or local?
• Does the method interpret model components or model behavior?
• Does the method compute feature effects, feature importance, or attributions?
Global versus local is the simplest one.

A global interpretation method describes the overall "behavior" of the model. Examples: Permutation feature importance, linear model coefficients, and SHAP importance.
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(