Jeffrey Wooldridge Profile picture
Jun 2 5 tweets 1 min read Twitter logo Read on Twitter
Unfortunately, indiscriminate use of the term "fixed effects" to describe any set of mutually exclusive and exhaustive dummy variables seems to be generating confusion about nonlinear models and the incidental parameters problem.

#metricstotheface
With panel data, the IPP arises when we try to include unit-specific dummies in a nonlinear model with a small number of time periods. We have few observations per "fixed effects." In other cases, IPP arises if we put in group-specific dummies with small group sizes.
But if we include, say, occupation dummies when we have lots of people in each occupation, this clearly causes no problem. Or, including interviewer "fixed effects" when we have lots of subjects per interviewer.
It's easy to find phrases such as "religion fixed effects" and "race fixed effects" and so on in published papers. Clearly the inclusion of such variables cannot cause an IPP in any nonlinear model. A clear majority knows this, but it can be confusing for the beginner.
The convention of using "fixed effects" to describe dummy variables is so ingrained that it's impossible to change now. So we probably just have to do a better job teaching what was originally meant by "fixed effects estimation" versus how "fixed effects" is now used.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

Mar 26
If Y, D (treatment), and Z (IV) are all binary with controls X, to obtain LATE you can use a linear model and estimate by IV:
Y = a + b*D + X*c + Z*(X - Xbar)*d + U
First stage:
D = f + g*Z + X*h + Z*(X - Xbar)*m + V
Or look at this recent WP by @TymonSloczynski, @sderyauysal, and me to use separate doubly robust estimates of the numerator and denominator. Can use logit outcome models for Y and D.

scholar.google.com/citations?view…
I also like trying separate probits that account for endogeneity of D using Heckman selection. Assumes two-sided noncompliance of Z is binary, but it can be used for general Z. Seeing how LATE estimates with covariates differ from ATE with covariates can be informative, I think.
Read 4 tweets
Dec 26, 2022
Much focus on Poisson regression (whether for cross section or FE Poisson for panel data) is on its consistency when the conditional mean (almost always assumed to be exponential) is correctly specified. This is its most important feature.
A less well known but very important feature is its relative efficiency in the class of robust estimators -- that is, estimators consistent when only the mean is correct. (This requirement rules out MLEs of lots of models, such as NegBin I and NegBin II.)
The efficiency claims are satisfying. In the random sampling case, Poisson QMLE is efficient in the class of robust estimators if the variance is proportional to the mean. The constant can be less than one or greater than one; it doesn't matter. It doesn't have to equal one!
Read 9 tweets
Dec 26, 2022
I've said this often to my students, both at MSU and in short courses:

There are good reasons and bad reasons not to use an estimator. You'll be more convincing as an empirical researcher if you know the difference.

Maybe this suggests a good way to write an exam ....
Good reason not to use standard random effects: It assumes heterogeneity is uncorrelated with X.

Bad reason not to use RE (linear model): It requires homoskedasticity and no serial correlation of idiosyncratic errors. (False)
Bad reason not to use a linear probability model: "The estimates are biased and inconsistent." (For what?)

(Possibly) good reason not to use LPM: It might not produce good approximations to average partial effects.
Read 4 tweets
Nov 11, 2022
To people who badger empirical researchers using micro-type panel data -- where N is pretty large and T is not -- into computing tests for cross-sectional dependence in the errors: Please stop!

These tests give lots of false positives due to unobserved heterogeneity.
This is essentially like testing for cluster correlation using residuals after OLS. Even under random sampling and random assignment -- where we know clustering is not needed -- tests for cluster correlation tests will often reject if there is neglected heterogeneity.
I've gotten a few emails asking me what should be done when, say, N = 3,000, T = 10, and one finds cross-sectional correlation in the residuals. Nothing reliable can be done, and usually nothing needs to be done.
Read 6 tweets
Aug 6, 2022
I've been so discombobulated lately that I don't keep track of what's in version of papers and what I include in lectures/teaching. So here's an update on what I've learned about DiD in 2022.

#jwdid (borrowing from @friosavila).
1. The pooled OLS method I proposed, which is the same as TWFE and random effects, is also equivalent to a version of imputation I proposed. That means it is consistent for various ATTs under weak assumptions (but those include no anticipation and parallel trends).
2. Because of this equivalence, POLS uses all possible control units in each time period for each cohort. Under standard assumptions, it is efficient.

3. I've only allowed time-constant covariates. But can see the "moderating effects" directly in the regression that gives ATTs.
Read 9 tweets
Aug 6, 2022
A DiD update. I've been editing my nonlinear DiD paper and I have posted a working paper here:

researchgate.net/publication/36…

It's actually more up to date than the latest version of the linear paper.
I've been trying to clean up the Stata do files for both the linear and nonlinear cases. I've learned a lot since last updating -- tricks that make things simpler (in linear and nonlinear cases). I'll pin a new tweet with the Dropbox location.
I'm probably a bit too happy with this paper. It works out elegantly and I think it's also useful. It's also very easy.

The simulations in the paper show how the nonlinear model can matter. The empirical example (common timing) shows it might not.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(