Based on questions I get, it seems there's confusion about choosing between RE and FE in panel data applications. I'm afraid I've contributed. The impression seems to be that if RE "passes" a suitable Hausman test then it should be used. This is false.
I'm trying to emphasize in my teaching that using RE (unless CRE = FE) is an act of desperation. If the FE estimates and the clustered standard errors are "good" (intentionally vague), there's no need to consider RE.
RE is considered when the FE estimates are too imprecise to do much with. With good controls -- say, industry dummies in a firm-level equation -- one might get by with RE. And then choosing between RE and FE makes some sense.
Unfortunately, it is still somewhat common to see a nonrobust Hausman test used. And this makes no logical sense when every other statistic has been made robust to serial correlation and heteroskedasticity. So either the traditional Hausman test should be adjusted, or use CRE.
In Stata, the following is common, and correct:
xtreg y i.year x1 ... xK, fe vce(cluster id)
xtreg y i.year x1 ... xK z1 ... zJ, re vce(cluster id)
But often it is followed by this:
xtreg y i.year x1 ... xK, fe
estimates store b_fe
xtreg y i.year x1 ... xK z1 ... zJ, re
estimates store b_re
hausman b_fe b_re
In addition to being nonrobust, the df in the test will be wrong: It should be K, not (T - 1) + K. The latter is easy to fix, the former is tricky ....
• • •
Missing some Tweet in this thread? You can try to
force a refresh
There's a good reason the Frisch-Waugh-Lovell Theorem is taught in intro econometrics, at least at the graduate level. It's used to characterize omitted variable bias as well as the plim of OLS estimators under treatment heterogeneity and also diff-in-diffs. And more.
I also teach the 2SLS version of FWL, where exogenous variables, X, are partialled out of the IVs, Z, with endogenous explan vars W. It's important to emphasize that the IV needs to be residualized with respect to X. Let Z" be those residuals. This is the key partialling out.
Then apply 2SLS to any of the equations
Y = W*b + U1
Y" = W*b + U2
Y" = W"*b + U3
Y = W"*b + U4
using IVs Z".
All four deliver the 2SLS estimates of b on the full equation Y = X*a + W*b + U with IVs (X,Z). All " variables have X partialled out from them.
I think the most commonly used treatment effect estimators when treatment, D, is unconfounded conditional on X, are the following: 1. Regression adjustment. 2. Inverse probability (propensity score) weighting. 3. Augmented IPW. 4. IPWRA 5. Covariate matching. 6. PS matching.
RA, AIPW, and IPWRA all use conditional mean functions; usually linear but can be logit, multinomial logit, exponential, and others.
I like RA because it is straightforward -- even if using logit or Poisson -- and it is easy to obtain moderating effects.
But, technically, RA requires correct specification of the conditional means E[Y(d)|X] for consistency.
IPW uses only specification of the PS. We now know we should use normalized weights. IPW can be sensitive to overlap problems because p^(X) can be close to one or zero.
It's been too long since I've made a substantive tweet, so here goes. At the following Dropbox link you can access the slides and Stata files for my recent talk at the Stata UK meeting:
Perhaps even longer to figure out some tricks to make standard error calculation for aggregated, weighted effects easy. I think I've figured out several useful relationships and shortcuts. Ex post, most are not surprising. I didn't have them all in my WP or my nonlinear DiD.
The talk is only about regression-based methods, but includes logit and Poisson regression (and even other nonlinear models). In the linear case, slide 28 shows a "very long regression." I was tempted to call it something like the "grand unified regression."
Okay, here goes. T = 2 balanced panel data. D defines treated group, f2_t is the second period dummy, W_t = D*f2_t is the treatment. Y_1 and Y_2 are outcomes in the first and second period. ΔY = Y_2 - Y_1. X are time-constant controls. X_dm = X - Xbar_1 (mean of treated units).
2. Pooled OLS of Y_t on 1, W_t, W_t*X_dm, D, X, D*X, f2_t, f2_t*X; ATT is coef on W_t (t = 1,2)
3. Random effects estimation with same variables in (2).
4. FE estimation of (2), where D, X, D*X drop out.
Imputation versions of each:
5. OLS ΔY on 1 X using D = 0. Get residuals TE^_FD. Average TE^_FD over treated units.
6. POLS of Y_t on 1, D, X, D*X, f2_t, f2_t*X using W_t = 0 (control obs). TE_t^_POLS resids. ATT is average of TE_t^_POLS over W_t = 1 (treated observations)
Thanks for doing this, Jon. I've been thinking about this quite a bit, and teaching my perspective. I should spend less time teaching, more time revising a certain paper. Here's my take, which I think overlaps a lot with yours.
I never thought of BJS as trying to do a typical event study. As I showed in my TWFE-TWMundlak paper, without covariates, BJS is the same as what I called extended TWFE. ETWFE puts in only treatment dummies of the form Dg*fs, s >= g, where Dg is cohort, fs is calendar time.
ETWFE is derivable from POLS using cohort dummies, which derives directly from imposing and using all implications of parallel trends. That's why it's relatively efficient under the traditional assumptions. To me, this is the starting point.
I sometimes get asked whether, in the context of interventions using DiD methods, whether an "always treated" (AT) group can be, or should be, included. Typically, there are also many units not treated until t = 2 or later. But some are treated at entry and remain treated.
The short answer is that these units don't help identify true treatment effects except under strong assumptions. Suppose we have only an AT and never treated (NT) group. Units have a string of zeros or string of ones for the treatment indicator.
Any estimated policy effect is comparing avgs between these groups. But there's no way to control for pre-treatment diffs between them. I might as well have one time period and use a diff in means estimator across the two groups.