On my shared Dropbox folder, pinned at the top, I posted the latest version of my TWFE/TWMundlak paper. It's essentially complete (and too long ...). I've included the "truly marvelous" proof of equivalence between pooled OLS and imputation.

#metricstotheface
I also fixed some of the material on testing/correcting for heterogeneous trends. A nice result is that the POLS approach with cohort-specific trends is the same as the obvious imputation approach.
This means that using the full regression to correct for non-paralled trends suffers no contamination when testing. It's identical to using only untreated obs to test for pre-trends. But one must allow full heterogen in cohort/time ATTs for the equiv to hold.
I'm very pleased with that result. In a nutshell, add interactions of everything that makes sense and use all of the data. That's identical to imputation. Add heterogen time trends. That's identical to imputation. The test for violation of PT is clean.
In the Stata do files (caution: use only the most recent ones!) I show how the equiv of POLS/TWFE with different imputation methods. The POLS approach actually gives more TEs than did_imputation. Or maybe it's just my limited knowledge.
I should mention there is a remaining gap in the paper, but I'm not going to pursue it fully. I showed that POLS and imputation are the same when using cohort dummies -- not unit fixed effects as in BJS (2021). They are the same, but the proof eludes me.
I don't want to do hard algebra. The equivalence proof I do have uses a simple observation about the FOCs that requires only trivial linear algebra. I suspect the proof might follow from the observation that the POLS and FE residuals differ by a time-constant factor.
I bet someone can show it pretty quickly, but my brains isn't working that way anymore.

Plus, I extended the result I do have to the nonlinear case, where putting in individual FEs should not be done. So I don't have the drive to pursue that result.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

28 Sep
Fortunately, the speculations I made in my linear DiD paper about extension to the nonlinear case turn out to be true -- with a small caveat. One should use the canonical link function for chosen quasi-log-likelihood (QLL) function.

#metricstotheface
So, exponential mean/Poisson QLL if y >= 0.
Logistic mean/Bernoulli QLL if 0 <= y <= 1 (binary or fractional). (We call this logit and fractional logit.)
Linear mean, normal (OLS, of course).

These choices ensure that pooled estimation and imputation are numerically identical.
It's not a coincidence that these same combos show up in my work on doubly robust estimation of treatment effects and improving efficiency without sacrificing consistency in RCTs. Latest on the latter is here:

scholar.google.com/citations?view…
Read 4 tweets
17 Aug
I finally got my TWFE/Mundlak/DID paper in good enough shape to make it an official working paper. I'll put it in other places but it's currently here:

researchgate.net/publication/35…

Also, the Stata stuff is still with the Dropbox link:

dropbox.com/sh/zj91darudf2…

#metricstotheface
I changed the title a bit to better reflect it's contents. I'm really happy with the results, less happy that the paper got a bit unwieldy. It's intended to be a "low hanging fruit" DID paper.
Now I've more formally shown that the estimator I was proposing -- either pooled OLS or TWFE or RE (they're all the same, properly done) identifies every dynamic treatment one is interested in (on means) in a staggered design.
Read 16 tweets
22 Jun
For my German friends: What is the German equivalent of "Ms." when addressing a woman (not yet a Dr.)? I noticed on a course application form in English -- I assume translated from German -- only two choices, "Mr." and "Mrs." Is "Frau" used for both Mrs. and Ms.?
As a follow-up: If I use English, I assume "Ms." is acceptable. I never address anyone as "Mrs." in English. It's interesting that "Frau" was translated as "Mrs." rather than "Ms." I would've expected the latter, especially in an academic setting.
My formal German courses were in the 1970s, and I learned that "Frau" is for married women only. I think I can make the adjustment, though. 🤓

I'm still intrigued that there is no "Ms." equivalent in German ....
Read 4 tweets
22 Jun
Here's a panel DID question. Common intervention at t=T0. Multiple pre-treatment and post-treatment periods. Dummy d(i) is one if a unit is eventually treated. p(t) is one for t >= T0. Treatment indicator is w(i,t) = d(i)*p(t). Time constant controls are x(i).

#metricstotheface
Consider several estimators of the avg TE [coef on w(i,t)]. Period dummies are f2(t), ... fT(t).

1. Pooled OLS: y(i,t) on w(i,t), 1, d(i), p(t)
2. TWFE including w(i,t).
3. POLS: y(i,t) on w(i,t) 1, d(i), p(t), x(i)
4. POLS: y(i,t) on w(i,t) 1, d(i), f2(t), ... fT(t), x(i)
For a balanced panel without degeneracies, which is the correct statement?
Read 5 tweets
10 Jun
I should admit that my tweets and poll about missing data were partly self serving, as I'm interested about what people do. But it was a mistake to leave the poll initially vague. I haven't said much useful on Twitter in some time, so I'll try here.

#metricstotheface
I want to start with the very simple case where there is one x and I'm interested in E(y|x); assume it's linear (for now). Data are missing on x but not on y. Here are some observations.
1. If the data are missing as a function of x -- formally, E(y|x,m) = E(y|x) -- the CC estimator is consistent (even conditionally unbiased).
2. Imputing on the basis of y is not and can be badly biased.
3. Inverse probability weighting using 1/P(m=0|y) also is inconsistent.
Read 7 tweets
25 May
Several comments on this paper. First, it's nice to see someone taking the units of measurement issue seriously. But I still see many issues, especially when y >= 0 and we have better alternatives.

1. A search is required over units of measurement.

#metricstotheface
How do a compute a legitimate standard error of, say, an elasticity? I've estimated theta but then I ignore the fact that I estimated it? That's not allowed.

2. As with many transformation models, the premise is there exists a transformation g(.) such that g(y) = xb + u.
u is assumed to be indep of x, at a minimum. Often the distrib is restricted. In 1989 in an IER paper I argued this was a real problem with Box-Cox approaches b/c u >= -xb. If I model E(y|x) directly I need none of that. It's what Poisson regression does.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(