Do you use diff-in-diff? Then this thread is for you.

You’re no dummy. You already know diverging trends in the pre-period can bias your results.

But I’m here to tell you about a TOTALLY DIFFERENT, SUPER SNEAKY kind of bias.

Friends, let’s talk regression to the mean. (1/N)
Flashback to 2015. @ryan_dydx @JFBurgessJr & @jdimick1 publish a simulation paper
dx.doi.org/10.1111/1475-6… (2/N)
In a scenario that should be FINE for diff-in-diff, they get CRAZY HIGH Type I error rates.

After matching on pre-period variables (via propensity scores), things do indeed look fine. (3/N)
“WHY?” I wondered. This diff-in-diff study should be unbiased.

Diff-in-diff nets out baseline differences…right? (4/N)
Let’s dig into the simulation, shall we?

In their simulation, hospitals with higher-than-average performance in the pre-period are more likely to be in treatment and vice versa. (5/N)
Here’s what it looks like: treatment hospitals (yellow) have higher baseline performance than control (purple) hospitals. (6/N)
However, if we do ordinary diff-in-diff, the two groups regress back to their (common) mean in the post-period and we get a BIASED result.

What the heck? Baseline differences aren’t supposed to be a problem for diff-in-diff! (7/N)
So Ryan et al turn to matching.

Taking treatment hospitals and control hospitals with similar baseline performance, the pre-period difference disappears, so there’s no difference in pre or post. (8/N)
And now our diff-in-diff is unbiased. #whew (9/N)
Fast-forward to 2017. @jamie_daw discovers that matched diff-in-diff might have some…problems.

In her subtly different simulation, Jamie generates treatment and control data from DIFFERENT populations.

Suppose they’re exactly as far apart as the Ryan et al. case. (10/N)
Now the ordinary, unmatched diff-in-diff is UNBIASED.

These two groups are not regressing back anywhere. Their mean difference is PERMANENT. (11/N)
So what happens if we match to make the pre-period difference go away?

It REAPPEARS in the post-period, as the two groups regress back to their respective means.

Matching INTRODUCES bias into an otherwise totally fine diff-in-diff. (12/N)
To recap:

Matching FIXES bias in the Ryan et al scenario.

Mathcing CAUSES bias in the Daw & Hatfield scenario.

And in NEITHER case are there any violations of parallel pre-trends. (13/N)
Side note: In our paper on this, Jamie and I also talk about how parallel trend problems may not be fixed by matching either dx.doi.org/10.1111/1475-6… (14/N)
So where does this leave us? Be very careful with diff-in-diff.

Causal inference is HARD. You have to think about causal MECHANISMS.

What CAUSED the baseline differences between treatment and control? Is it likely to PERSIST into the post-period? (15/N)
Like a good academic, I’ll close with relevant cites.

@ryan_dydx wrote a commentary for @hsr_hret dx.doi.org/10.1111/1475-6…

@jamie_daw and I responded dx.doi.org/10.1111/1475-6… (16/N)
@sylvainCF noticed the problem dx.doi.org/10.1016/j.jeco… and worked out the theory

@Lizstuartdc @Michael_Chernew @colleenlbarry et al. developed symmetric PS weighting for diff-in-diff that avoids the problem dx.doi.org/10.1007/s10742… (17/N)
Diff-in-diff with caution, friends.

(THE END)
Missing some Tweet in this thread?
You can try to force a refresh.

# Like this thread? Get email updates or save it to PDF!

###### Subscribe to Laura Hatfield

Get real-time email alerts when new unrolls are available from this author!

###### This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

# Try unrolling a thread yourself!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" `@threadreaderapp unroll`