What is a causal model and how is it different from a "common" statistical model?
👇
Thread on a mental picture and intuition how one may think about (a subclass of) causal models and the causal discovery problem. 1/
A "common" statistical model models one joint distribution over variables X = {A, B, …}; a causal model models a set of joint distributions over X, one for each intervention.
Here line segments correspond to the modellable distributions for varying model parameters. 2/
The distributions are indexed by so-called interventions i_k. The intervention set admits a partial ordering reflecting the compositionality of interventions, e.g. do(A=0) ≤ do(A=0, B=0).
↳ A causal model as a poset of joint distributions.
(Ø denotes the null-intervention) 3/
Structural Equation Models are one convenient way to describe such a structured set of distributions:
A set of equations and noise variables together with instructions on how to manipulate the equations upon intervention is enough to describe the entire poset.
4/
In causal discovery we aim to infer a (structural) causal model from some observational data that correctly predicts the effects of interventions, while we are only given samples from a subset of the distributions, often only the observational distribution. 5/
It is notoriously difficult to not only model the distributions of which observations are available (common statistical modelling), but to infer a causal model that enables reasoning beyond the observed distributions on the effects of interventions.
eg.
The statistical treatment of causal discovery lays out different approaches that clarify under which additional assumptions causal structure can indeed be identified.
@bttyeo@eliasbareinboim@KordingLab@EpiEllie@causalinf Why is fitting the observational distribution and selecting a model based on model fit/complexity insufficient for causal modelling?
See a follow-up thread here illustrating some intricacies in causal discovery:
Does the number of boxes loaded _cause_ the risk of a truck rolling over?
In what ways does confounding (or having access only to certain macro-variables) limit causal inference in neuroimaging?
How are these two questions related?
Check out the thread below 👇
1/
Bear with me, in an attempt to strip down the problem and to provide a starting point for a constructive discourse I am deliberately not using neuro lingo to begin with.
I hope the following idealised simplified toy example turns out to be instructive.
2/
Let's pretend we are gatekeepers at a dispatch warehouse. Our task is to decide which trucks are good to go and safe to hit the road.
For example, we do not let pass empty trucks (the weigh bridge shows the truck's tare weight) to avoid unnecessary empty drives.