Jeffrey Wooldridge Profile picture
Mar 24, 2021 13 tweets 3 min read Read on X
Speaking of two-way FE, it's been under fire for the last few years for estimating treatment effects in DID designs -- especially staggered designs. As many on here know. As an older person, I don't let go of my security blankets so easily.

#metricstotheface
Certainly the simple TWFE estimator that estimates a single coefficient can be misleading. We know this thanks to recent work of several talented econometricians (you know who you are). But maybe we're just not being flexible enough with treatment heterogeneity.
Now when I teach panel data interventions, I start with basic TWFE but note that, with multiple treatment periods and different entry times, we can easily include interactions that allow for many different average treatment effects (on the treated).
The ATTs can vary by exposure (cohort) and calendar date. For example, if we have 4 entry times with irreversibility, we estimate 4 + 3 + 2 + 1 = 10 different effects rather than one. These identify the ATTs for the different exposure levels and time periods.
Not surprisingly, identification requires no anticipation and common trends. I dabbled with this a bit in my 2005 REStat paper, but I didn't do a full analysis of what one can identify with different treatment patterns.
When we introduce covariates -- so that CT holds conditional on covariates as in Callaway and Sant'Anna -- we get further flexibility. With four entry periods and one covariate here are 14 additional interactions.
When the covariates are centered about exposure-specific means, the ATTs for each exposure/time period are easily gotten. With 4 control periods and 4 treatment periods and just a single X, the TWFE includes 4 + 10 + 10 regressors (not including FE dummies).
Why am I not abondoning the TWFE framework? I'm getting old and I'm lazy. But also I know FE has resiliency to unbalanced panels. It has bias on the order of 1/T when strict exogeneity is violated. Estimating unit-specific trends, as in my 2005 REStat, is a clear extension.
So I know that, with multiple pre-treatment periods, I can remove unit-specific trends to at least partly relax the common trends assumption. Another reason for studying FE: the equivalence with the Mundlak regression suggests strategies for nonlinear models.
I'm trying to finish a draft of what seems like mostly an expository paper, with the thrilling title "Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimation." Oh, and I'm preparing for an interview with @causalinf.
A sample (and simple) Stata command with T = 4, two treated periods (3 and 4), staggered, one x:

xtreg y c.e3#c.d2013 c.e3#c.d2014 c.e4#c.d2014 c.e3#c.d2013#c.x_dm3 c.e3#c.d2014#c.x_dm3 c.e4#c.d2014#c.x_dm4 d2013 d2014 c.d2013#c.x c.d2014#c.x, fe vce(cluster id)
I expect I'm about to be taught some things. One is never too old for that ....
The coefficients on the first three terms are the estimated TEs. The ATT for cohort first exposed in 2013 during 2013, the effect for that cohort in 2014, and the effect for cohort first exposed in 2014 during 2014.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

Jan 23
Thanks for doing this, Jon. I've been thinking about this quite a bit, and teaching my perspective. I should spend less time teaching, more time revising a certain paper. Here's my take, which I think overlaps a lot with yours.
I never thought of BJS as trying to do a typical event study. As I showed in my TWFE-TWMundlak paper, without covariates, BJS is the same as what I called extended TWFE. ETWFE puts in only treatment dummies of the form Dg*fs, s >= g, where Dg is cohort, fs is calendar time.
ETWFE is derivable from POLS using cohort dummies, which derives directly from imposing and using all implications of parallel trends. That's why it's relatively efficient under the traditional assumptions. To me, this is the starting point.
Read 12 tweets
Dec 19, 2023
I sometimes get asked whether, in the context of interventions using DiD methods, whether an "always treated" (AT) group can be, or should be, included. Typically, there are also many units not treated until t = 2 or later. But some are treated at entry and remain treated.
The short answer is that these units don't help identify true treatment effects except under strong assumptions. Suppose we have only an AT and never treated (NT) group. Units have a string of zeros or string of ones for the treatment indicator.
Any estimated policy effect is comparing avgs between these groups. But there's no way to control for pre-treatment diffs between them. I might as well have one time period and use a diff in means estimator across the two groups.
Read 7 tweets
Nov 22, 2023
Here's a simple result from probability that I'm not sure is widely known. It has important practical implications, particularly for incorporating heterogeneity into models.

Suppose one starts with a "structural" conditional expectation, E(Y|X,U) = g(X,U), where U is unobserved.
Usually g(.,.) is parameterized, but, unless the model is additive in U, the parameters may not mean much. We tend these days to focus on average partial effects. So, for example, E[dg(X,U)/dx] when X is continuous. The expectation is over (X,U).
Here's the result: if U and X are independent, then the APEs from g(X,U) are identical to the APEs from E(Y|X) = f(X). In other words, if the focus is on APEs, introducing U that is independent of X is largely a waste of time. And it can confuse the issue.
Read 10 tweets
Oct 28, 2023
How come Stata doesn't report an R-squared with the "newey" command?
In my opinion, the correct answer is (c): no good reason. Supposed "problems" with the R-squared with heterosk or ser correlation seem to be holdovers from old textbooks. There's no unbiased estimator of the pop R^2, so discussing bias really is off base.
The "bias" discussions are in terms of sigma^2_hat, anyway, and the bias in that is 1/T. But the R-squared is consistent for pop R^2 very generally with heterosk and/or ser correlation. Its exclusion from "newey" can confuse the beginner.
Read 4 tweets
Jun 2, 2023
Unfortunately, indiscriminate use of the term "fixed effects" to describe any set of mutually exclusive and exhaustive dummy variables seems to be generating confusion about nonlinear models and the incidental parameters problem.

#metricstotheface
With panel data, the IPP arises when we try to include unit-specific dummies in a nonlinear model with a small number of time periods. We have few observations per "fixed effects." In other cases, IPP arises if we put in group-specific dummies with small group sizes.
But if we include, say, occupation dummies when we have lots of people in each occupation, this clearly causes no problem. Or, including interviewer "fixed effects" when we have lots of subjects per interviewer.
Read 5 tweets
Mar 26, 2023
If Y, D (treatment), and Z (IV) are all binary with controls X, to obtain LATE you can use a linear model and estimate by IV:
Y = a + b*D + X*c + Z*(X - Xbar)*d + U
First stage:
D = f + g*Z + X*h + Z*(X - Xbar)*m + V
Or look at this recent WP by @TymonSloczynski, @sderyauysal, and me to use separate doubly robust estimates of the numerator and denominator. Can use logit outcome models for Y and D.

scholar.google.com/citations?view…
I also like trying separate probits that account for endogeneity of D using Heckman selection. Assumes two-sided noncompliance of Z is binary, but it can be used for general Z. Seeing how LATE estimates with covariates differ from ATE with covariates can be informative, I think.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(