Machine learning sucks at uncertainty quantification.
But there is a solution that almost sounds too good to be true:
conformal prediction
• works for any black box model
• requires few lines of code
• is fast
• comes with statistical guarantees
A thread 🧵
Conformal prediction is a method for uncertainty quantification of machine learning models.
The method takes a heuristic uncertainty score and turns it into a rigorous one.
What confused me first about conformal prediction: It's not 1 algorithm, but a general recipe:
• Split data: training and calibration
• Train model
• Calculate "heuristic" scores on calibration data
• Calibrate score
• Use calibrated scoring rule on new data
The recipe enables many use cases:
• Prediction sets in multi-class problems
• Calibrate classification scores so they can be interpreted as probabilities
• Fix coverage of quantile regression
• Conformal predictive distributions for regression
• ...
Orthogonal/double machine learning brings causal inference to supervised learning. You can estimate treatment effects by training two models (one for treatment, one for control).
Bayesians versus Frequentists is an ancient debate.
But have you heard of likelihoodism?
🧵 A thread on likelihoodism, why no one uses it, and how it helps to understand the Bayesian versus Frequentist debate better.
Likelihoodists honor the likelihood function above all else.
• They reject prior probabilities. That's a big middle finger to the Bayesian approach.
• Evidence from the data must only come through the likelihood. That's why they reject frequentist inference.
I gotta explain the second point. It's not intuitive how frequentist modeling violates this "likelihood principle".
In other words: in frequentist inference, information from the data influences the modeling, but is not part of the likelihood. Nani?? How can that be?
Most ML interpretation methods have a common enemy:
Correlated features.
They ruin interpretation both on a technical and a philosophical level.
Why correlation is problematic, how to patch it, and why we have no cure.
A thread 🧵
Correlated features are the rule, not the exception.
• Predicting bike rentals? Season and temperature are correlated.
• Credit scoring? Income correlates with age, job, ...
• Diagnosing patients? Blood values are correlated, like multiple markers of inflammation, ...
We'll use 3 points of view to understand the effect of correlation on interpretability:
• Extrapolation
• Entanglement
• Latent variables
Note: correlation here includes more general dependencies, not only linear correlation.
Supervised machine learning models are deployed everywhere.
It's an open secret that all models have a huge problem:
Performative prediction - when predictions change future outcomes.
How to spot and handle this problem: A thread 🧵
Once a machine learning model is deployed in the wild, the predictions will affect its environment. If not, what would be the point of the model? That's why almost every deployed model is affected by performative prediction. Once you get the concept, you see it everywhere.
The model changes the environment ...
So what?
The change often affects model performance. And so endangers the product, the people involved and your wallet.
It's overwhelming to keep up with research on interpretable machine learning. I say that as the author of the Interpretable Machine Learning book. 😅
I use these 3 questions to quickly understand new interpretation methods:
Questions for a quick assessment of an ML interpretation method:
• Are the explanations global or local?
• Does the method interpret model components or model behavior?
• Does the method compute feature effects, feature importance, or attributions?
Global versus local is the simplest one.
A global interpretation method describes the overall "behavior" of the model. Examples: Permutation feature importance, linear model coefficients, and SHAP importance.