Jeffrey Wooldridge Profile picture
University Distinguished Professor of Economics, Michigan State University. Econometrician and textbook author. Two dogs, one rabbit, two fish, and N + 2 cats.
9 subscribers
Nov 3 8 tweets 2 min read
There's a good reason the Frisch-Waugh-Lovell Theorem is taught in intro econometrics, at least at the graduate level. It's used to characterize omitted variable bias as well as the plim of OLS estimators under treatment heterogeneity and also diff-in-diffs. And more. I also teach the 2SLS version of FWL, where exogenous variables, X, are partialled out of the IVs, Z, with endogenous explan vars W. It's important to emphasize that the IV needs to be residualized with respect to X. Let Z" be those residuals. This is the key partialling out.
Sep 28 17 tweets 3 min read
I think the most commonly used treatment effect estimators when treatment, D, is unconfounded conditional on X, are the following:
1. Regression adjustment.
2. Inverse probability (propensity score) weighting.
3. Augmented IPW.
4. IPWRA
5. Covariate matching.
6. PS matching. RA, AIPW, and IPWRA all use conditional mean functions; usually linear but can be logit, multinomial logit, exponential, and others.

I like RA because it is straightforward -- even if using logit or Poisson -- and it is easy to obtain moderating effects.
Sep 28 23 tweets 4 min read
It's been too long since I've made a substantive tweet, so here goes. At the following Dropbox link you can access the slides and Stata files for my recent talk at the Stata UK meeting:



It's taken me awhile to see connections among various estimators.dropbox.com/scl/fo/50imn36… Perhaps even longer to figure out some tricks to make standard error calculation for aggregated, weighted effects easy. I think I've figured out several useful relationships and shortcuts. Ex post, most are not surprising. I didn't have them all in my WP or my nonlinear DiD.
May 25 6 tweets 2 min read
Okay, here goes. T = 2 balanced panel data. D defines treated group, f2_t is the second period dummy, W_t = D*f2_t is the treatment. Y_1 and Y_2 are outcomes in the first and second period. ΔY = Y_2 - Y_1. X are time-constant controls. X_dm = X - Xbar_1 (mean of treated units). Eight equivalent methods:

1. OLS ΔY on 1, D, X, D*X_dm (cross sec)

2. Pooled OLS of Y_t on 1, W_t, W_t*X_dm, D, X, D*X, f2_t, f2_t*X; ATT is coef on W_t (t = 1,2)

3. Random effects estimation with same variables in (2).

4. FE estimation of (2), where D, X, D*X drop out.
Jan 23 12 tweets 3 min read
Thanks for doing this, Jon. I've been thinking about this quite a bit, and teaching my perspective. I should spend less time teaching, more time revising a certain paper. Here's my take, which I think overlaps a lot with yours. I never thought of BJS as trying to do a typical event study. As I showed in my TWFE-TWMundlak paper, without covariates, BJS is the same as what I called extended TWFE. ETWFE puts in only treatment dummies of the form Dg*fs, s >= g, where Dg is cohort, fs is calendar time.
Dec 19, 2023 7 tweets 2 min read
I sometimes get asked whether, in the context of interventions using DiD methods, whether an "always treated" (AT) group can be, or should be, included. Typically, there are also many units not treated until t = 2 or later. But some are treated at entry and remain treated. The short answer is that these units don't help identify true treatment effects except under strong assumptions. Suppose we have only an AT and never treated (NT) group. Units have a string of zeros or string of ones for the treatment indicator.
Nov 22, 2023 10 tweets 2 min read
Here's a simple result from probability that I'm not sure is widely known. It has important practical implications, particularly for incorporating heterogeneity into models.

Suppose one starts with a "structural" conditional expectation, E(Y|X,U) = g(X,U), where U is unobserved. Usually g(.,.) is parameterized, but, unless the model is additive in U, the parameters may not mean much. We tend these days to focus on average partial effects. So, for example, E[dg(X,U)/dx] when X is continuous. The expectation is over (X,U).
Oct 28, 2023 4 tweets 1 min read
How come Stata doesn't report an R-squared with the "newey" command? In my opinion, the correct answer is (c): no good reason. Supposed "problems" with the R-squared with heterosk or ser correlation seem to be holdovers from old textbooks. There's no unbiased estimator of the pop R^2, so discussing bias really is off base.
Jun 2, 2023 5 tweets 1 min read
Unfortunately, indiscriminate use of the term "fixed effects" to describe any set of mutually exclusive and exhaustive dummy variables seems to be generating confusion about nonlinear models and the incidental parameters problem.

#metricstotheface With panel data, the IPP arises when we try to include unit-specific dummies in a nonlinear model with a small number of time periods. We have few observations per "fixed effects." In other cases, IPP arises if we put in group-specific dummies with small group sizes.
Mar 26, 2023 4 tweets 2 min read
If Y, D (treatment), and Z (IV) are all binary with controls X, to obtain LATE you can use a linear model and estimate by IV:
Y = a + b*D + X*c + Z*(X - Xbar)*d + U
First stage:
D = f + g*Z + X*h + Z*(X - Xbar)*m + V Or look at this recent WP by @TymonSloczynski, @sderyauysal, and me to use separate doubly robust estimates of the numerator and denominator. Can use logit outcome models for Y and D.

scholar.google.com/citations?view…
Dec 26, 2022 9 tweets 2 min read
Much focus on Poisson regression (whether for cross section or FE Poisson for panel data) is on its consistency when the conditional mean (almost always assumed to be exponential) is correctly specified. This is its most important feature. A less well known but very important feature is its relative efficiency in the class of robust estimators -- that is, estimators consistent when only the mean is correct. (This requirement rules out MLEs of lots of models, such as NegBin I and NegBin II.)
Dec 26, 2022 4 tweets 1 min read
I've said this often to my students, both at MSU and in short courses:

There are good reasons and bad reasons not to use an estimator. You'll be more convincing as an empirical researcher if you know the difference.

Maybe this suggests a good way to write an exam .... Good reason not to use standard random effects: It assumes heterogeneity is uncorrelated with X.

Bad reason not to use RE (linear model): It requires homoskedasticity and no serial correlation of idiosyncratic errors. (False)
Nov 11, 2022 6 tweets 1 min read
To people who badger empirical researchers using micro-type panel data -- where N is pretty large and T is not -- into computing tests for cross-sectional dependence in the errors: Please stop!

These tests give lots of false positives due to unobserved heterogeneity. This is essentially like testing for cluster correlation using residuals after OLS. Even under random sampling and random assignment -- where we know clustering is not needed -- tests for cluster correlation tests will often reject if there is neglected heterogeneity.
Aug 6, 2022 9 tweets 2 min read
I've been so discombobulated lately that I don't keep track of what's in version of papers and what I include in lectures/teaching. So here's an update on what I've learned about DiD in 2022.

#jwdid (borrowing from @friosavila). 1. The pooled OLS method I proposed, which is the same as TWFE and random effects, is also equivalent to a version of imputation I proposed. That means it is consistent for various ATTs under weak assumptions (but those include no anticipation and parallel trends).
Aug 6, 2022 7 tweets 2 min read
A DiD update. I've been editing my nonlinear DiD paper and I have posted a working paper here:

researchgate.net/publication/36…

It's actually more up to date than the latest version of the linear paper. I've been trying to clean up the Stata do files for both the linear and nonlinear cases. I've learned a lot since last updating -- tricks that make things simpler (in linear and nonlinear cases). I'll pin a new tweet with the Dropbox location.
Jul 6, 2022 10 tweets 2 min read
A problem with specification testing is that it can lead those who are inexperienced to think that empirical work is mostly about applying a slew of specification tests to a particular model and then trying to sort out the findings. This is apparent with linear panel data models, where one sees the Breusch-Pagan test used to choose between POLS and RE; the F test of the unit-specific dummies to choose between POLS and FE; and the Hausman test to choose between RE and FE.
Jun 10, 2022 10 tweets 3 min read
Not sure about that! But here's a first attempt. Suppose I have a control group and G treatment levels. The treatment, W, is in {0,1,2,...,G} is unconfounded conditional on X. Assume the overlap condition 0 < p0(x) = P(W=0|X=x) for all x in Support(X). This isn't a trivial assumption b/c it requires that for and subset of the population as determined by values of x, there are some control units. However, if this isn't true, one can trim the sample -- as in the Crump et al. "Moving the Goalposts" work.
Apr 20, 2022 6 tweets 2 min read
If in a staggered DiD setting I write an equation with a full set of treatment indicators by treated cohort and calendar time, and include c(i) + f(t) (unit and time "fixed effects"), would you still call that a "fixed effects" model? If you answer "yes" then you should stop saying things like "there's a problem with the TWFE 'model'." The modeling is our choice; we choose what to put in x(i,t) when we write

y(i,t) = x(i,t)*b + c(i) + f(t) + u(i,t)

The phrase "TWFE model" refers to c(i) + f(t), right?
Feb 18, 2022 11 tweets 3 min read
Not exactly. I like Bruce's approach in this paper and it yields nice insights. But in twitter and private exchanges last week, and what I've learned since, it seems that the class of estimators in play in Theorem 5 include only estimators that are linear in Y.

#metricstotheface Theorem 5 is correct and neat, but leaves open the question of which estimators are in the class that is being compared with OLS. Remember, we cannot simply use phrases such as "OLS is BUE" without clearly defining the competing class of estimators. This is critical.
Feb 13, 2022 13 tweets 3 min read
Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.

#metricstotheface A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.
Feb 13, 2022 4 tweets 1 min read
This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.