Jeffrey Wooldridge Profile picture
University Distinguished Professor of Economics, Michigan State University. Econometrician and textbook author. Two dogs, one rabbit, two fish, and N + 2 cats.
Julie Brun Bjørkheim Profile picture Andrei Wong Profile picture Alwyn Zhou Profile picture Shuvasish Sharma Profile picture Julian Profile picture 9 subscribed
Jan 23 12 tweets 3 min read
Thanks for doing this, Jon. I've been thinking about this quite a bit, and teaching my perspective. I should spend less time teaching, more time revising a certain paper. Here's my take, which I think overlaps a lot with yours. I never thought of BJS as trying to do a typical event study. As I showed in my TWFE-TWMundlak paper, without covariates, BJS is the same as what I called extended TWFE. ETWFE puts in only treatment dummies of the form Dg*fs, s >= g, where Dg is cohort, fs is calendar time.
Dec 19, 2023 7 tweets 2 min read
I sometimes get asked whether, in the context of interventions using DiD methods, whether an "always treated" (AT) group can be, or should be, included. Typically, there are also many units not treated until t = 2 or later. But some are treated at entry and remain treated. The short answer is that these units don't help identify true treatment effects except under strong assumptions. Suppose we have only an AT and never treated (NT) group. Units have a string of zeros or string of ones for the treatment indicator.
Nov 22, 2023 10 tweets 2 min read
Here's a simple result from probability that I'm not sure is widely known. It has important practical implications, particularly for incorporating heterogeneity into models.

Suppose one starts with a "structural" conditional expectation, E(Y|X,U) = g(X,U), where U is unobserved. Usually g(.,.) is parameterized, but, unless the model is additive in U, the parameters may not mean much. We tend these days to focus on average partial effects. So, for example, E[dg(X,U)/dx] when X is continuous. The expectation is over (X,U).
Oct 28, 2023 4 tweets 1 min read
How come Stata doesn't report an R-squared with the "newey" command? In my opinion, the correct answer is (c): no good reason. Supposed "problems" with the R-squared with heterosk or ser correlation seem to be holdovers from old textbooks. There's no unbiased estimator of the pop R^2, so discussing bias really is off base.
Jun 2, 2023 5 tweets 1 min read
Unfortunately, indiscriminate use of the term "fixed effects" to describe any set of mutually exclusive and exhaustive dummy variables seems to be generating confusion about nonlinear models and the incidental parameters problem.

#metricstotheface With panel data, the IPP arises when we try to include unit-specific dummies in a nonlinear model with a small number of time periods. We have few observations per "fixed effects." In other cases, IPP arises if we put in group-specific dummies with small group sizes.
Mar 26, 2023 4 tweets 2 min read
If Y, D (treatment), and Z (IV) are all binary with controls X, to obtain LATE you can use a linear model and estimate by IV:
Y = a + b*D + X*c + Z*(X - Xbar)*d + U
First stage:
D = f + g*Z + X*h + Z*(X - Xbar)*m + V Or look at this recent WP by @TymonSloczynski, @sderyauysal, and me to use separate doubly robust estimates of the numerator and denominator. Can use logit outcome models for Y and D.

scholar.google.com/citations?view…
Dec 26, 2022 9 tweets 2 min read
Much focus on Poisson regression (whether for cross section or FE Poisson for panel data) is on its consistency when the conditional mean (almost always assumed to be exponential) is correctly specified. This is its most important feature. A less well known but very important feature is its relative efficiency in the class of robust estimators -- that is, estimators consistent when only the mean is correct. (This requirement rules out MLEs of lots of models, such as NegBin I and NegBin II.)
Dec 26, 2022 4 tweets 1 min read
I've said this often to my students, both at MSU and in short courses:

There are good reasons and bad reasons not to use an estimator. You'll be more convincing as an empirical researcher if you know the difference.

Maybe this suggests a good way to write an exam .... Good reason not to use standard random effects: It assumes heterogeneity is uncorrelated with X.

Bad reason not to use RE (linear model): It requires homoskedasticity and no serial correlation of idiosyncratic errors. (False)
Nov 11, 2022 6 tweets 1 min read
To people who badger empirical researchers using micro-type panel data -- where N is pretty large and T is not -- into computing tests for cross-sectional dependence in the errors: Please stop!

These tests give lots of false positives due to unobserved heterogeneity. This is essentially like testing for cluster correlation using residuals after OLS. Even under random sampling and random assignment -- where we know clustering is not needed -- tests for cluster correlation tests will often reject if there is neglected heterogeneity.
Aug 6, 2022 9 tweets 2 min read
I've been so discombobulated lately that I don't keep track of what's in version of papers and what I include in lectures/teaching. So here's an update on what I've learned about DiD in 2022.

#jwdid (borrowing from @friosavila). 1. The pooled OLS method I proposed, which is the same as TWFE and random effects, is also equivalent to a version of imputation I proposed. That means it is consistent for various ATTs under weak assumptions (but those include no anticipation and parallel trends).
Aug 6, 2022 7 tweets 2 min read
A DiD update. I've been editing my nonlinear DiD paper and I have posted a working paper here:

researchgate.net/publication/36…

It's actually more up to date than the latest version of the linear paper. I've been trying to clean up the Stata do files for both the linear and nonlinear cases. I've learned a lot since last updating -- tricks that make things simpler (in linear and nonlinear cases). I'll pin a new tweet with the Dropbox location.
Jul 6, 2022 10 tweets 2 min read
A problem with specification testing is that it can lead those who are inexperienced to think that empirical work is mostly about applying a slew of specification tests to a particular model and then trying to sort out the findings. This is apparent with linear panel data models, where one sees the Breusch-Pagan test used to choose between POLS and RE; the F test of the unit-specific dummies to choose between POLS and FE; and the Hausman test to choose between RE and FE.
Jun 10, 2022 10 tweets 3 min read
Not sure about that! But here's a first attempt. Suppose I have a control group and G treatment levels. The treatment, W, is in {0,1,2,...,G} is unconfounded conditional on X. Assume the overlap condition 0 < p0(x) = P(W=0|X=x) for all x in Support(X). This isn't a trivial assumption b/c it requires that for and subset of the population as determined by values of x, there are some control units. However, if this isn't true, one can trim the sample -- as in the Crump et al. "Moving the Goalposts" work.
Apr 20, 2022 6 tweets 2 min read
If in a staggered DiD setting I write an equation with a full set of treatment indicators by treated cohort and calendar time, and include c(i) + f(t) (unit and time "fixed effects"), would you still call that a "fixed effects" model? If you answer "yes" then you should stop saying things like "there's a problem with the TWFE 'model'." The modeling is our choice; we choose what to put in x(i,t) when we write

y(i,t) = x(i,t)*b + c(i) + f(t) + u(i,t)

The phrase "TWFE model" refers to c(i) + f(t), right?
Feb 18, 2022 11 tweets 3 min read
Not exactly. I like Bruce's approach in this paper and it yields nice insights. But in twitter and private exchanges last week, and what I've learned since, it seems that the class of estimators in play in Theorem 5 include only estimators that are linear in Y.

#metricstotheface Theorem 5 is correct and neat, but leaves open the question of which estimators are in the class that is being compared with OLS. Remember, we cannot simply use phrases such as "OLS is BUE" without clearly defining the competing class of estimators. This is critical.
Feb 13, 2022 13 tweets 3 min read
Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.

#metricstotheface A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.
Feb 13, 2022 4 tweets 1 min read
This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.
Feb 12, 2022 5 tweets 2 min read
One of the remarkable features of Bruce's result, and why I never could have discovered it, is that the "asymptotic" analog doesn't seem to hold. Suppose we assume random sampling and in the population specify

A1. E(y|x) = x*b0
A2. Var(y|x) = (s0)^2

#metricstotheface Also assume rank E(x'x) = k so no perfect collinearity in the population. Then OLS is asymptotically efficient among estimators that only use A1 for consistency. But OLS is not asymp effic among estimators that use A1 and A2 for consistency.
Jan 18, 2022 5 tweets 2 min read
Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).

#metricstotheface Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.
Jan 18, 2022 6 tweets 2 min read
If you know people who teach students it's important to "test" for multicollinearity, please ask them why.

I imagine a world where the phrase "I tested for multicollinearity" no longer appears in published work. I know John Lennon would be on my side.

#metricstotheface What I'm getting at is that it's still common to see "tests" for multicollinearity without even looking at the regression output. Or asking which variables are collinear. Often it's control variables. So what? If you have many control variables you might have to select.
Nov 6, 2021 5 tweets 1 min read
A Twitter primer on the canonical link the linear exponential family. I've used this combination in a few of my papers: the doubly robust estimators for estimating average treatment effects, improving efficiency in RCTs, and, most recently, nonlinear DiD.

#metricstotheface The useful CL/LEF combinations are:
1. linear mean/normal
2. logistic mean/Bernoulli (binary fractional)
3. logistic mean/binomial (0 <= Y <= M)
4. exponential mean/Poisson (Y >= 0)
5. logistic means/multinomial

The last isn't used very much -- yet.