Causation and time both flow in the direction of the arrows.
That includes known unknowns — if we think there might be a variable then we should add an “unknown” node.
But why should we make one at all? The secret is that (unlike causation or time) association can flow upstream—against the direction of the arrows!
And association is (often) not causation!
I’ll give another example of that later, but first let’s talk about “confounding”
The arrows flow from ice cream to summer (b/c association can flow upstream), & then from summer to crime.
Existence of this “path” tells us ice cream & crime will be related in an analysis that doesn’t control for summer
1) causation flows in the direction of arrows
2) if two arrow tails or a tail & a head meet, association can flow between them
3) if two arrow heads meet, association is blocked
4) rules 2 & 3 reverse when we restrict or control
That bias comes from what happens *after* random assignment.
If we think or know some other variable causes treatment & health, we have an association problem on our DAG.
There’s a path from pills to ??? to health that will muddle up our estimate even if we block exercise.
To wrap up, let’s go back to the very first problem of selection bias, and look at one last example that is often a problem in observational studies.
We want to know if a medication causes stomach pain, and we decide to do a case-control study.
We pick people with stomach pain and people with headaches from the local hospital. But being in hospital is a common effect...
There are 3 ways arrows can meet at variables:
Chains indicate causal paths. Association can flow between A and C through forks when we don’t condition on B, and colliders when we do condition on B.
We can also use DAGs to help us think through weird or unexpected study results, or conflicting findings from multiple studies.