Unfortunately, indiscriminate use of the term "fixed effects" to describe any set of mutually exclusive and exhaustive dummy variables seems to be generating confusion about nonlinear models and the incidental parameters problem.
With panel data, the IPP arises when we try to include unit-specific dummies in a nonlinear model with a small number of time periods. We have few observations per "fixed effects." In other cases, IPP arises if we put in group-specific dummies with small group sizes.
But if we include, say, occupation dummies when we have lots of people in each occupation, this clearly causes no problem. Or, including interviewer "fixed effects" when we have lots of subjects per interviewer.
It's easy to find phrases such as "religion fixed effects" and "race fixed effects" and so on in published papers. Clearly the inclusion of such variables cannot cause an IPP in any nonlinear model. A clear majority knows this, but it can be confusing for the beginner.
The convention of using "fixed effects" to describe dummy variables is so ingrained that it's impossible to change now. So we probably just have to do a better job teaching what was originally meant by "fixed effects estimation" versus how "fixed effects" is now used.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
If Y, D (treatment), and Z (IV) are all binary with controls X, to obtain LATE you can use a linear model and estimate by IV:
Y = a + b*D + X*c + Z*(X - Xbar)*d + U
First stage:
D = f + g*Z + X*h + Z*(X - Xbar)*m + V
Or look at this recent WP by @TymonSloczynski, @sderyauysal, and me to use separate doubly robust estimates of the numerator and denominator. Can use logit outcome models for Y and D.
I also like trying separate probits that account for endogeneity of D using Heckman selection. Assumes two-sided noncompliance of Z is binary, but it can be used for general Z. Seeing how LATE estimates with covariates differ from ATE with covariates can be informative, I think.
Much focus on Poisson regression (whether for cross section or FE Poisson for panel data) is on its consistency when the conditional mean (almost always assumed to be exponential) is correctly specified. This is its most important feature.
A less well known but very important feature is its relative efficiency in the class of robust estimators -- that is, estimators consistent when only the mean is correct. (This requirement rules out MLEs of lots of models, such as NegBin I and NegBin II.)
The efficiency claims are satisfying. In the random sampling case, Poisson QMLE is efficient in the class of robust estimators if the variance is proportional to the mean. The constant can be less than one or greater than one; it doesn't matter. It doesn't have to equal one!
To people who badger empirical researchers using micro-type panel data -- where N is pretty large and T is not -- into computing tests for cross-sectional dependence in the errors: Please stop!
These tests give lots of false positives due to unobserved heterogeneity.
This is essentially like testing for cluster correlation using residuals after OLS. Even under random sampling and random assignment -- where we know clustering is not needed -- tests for cluster correlation tests will often reject if there is neglected heterogeneity.
I've gotten a few emails asking me what should be done when, say, N = 3,000, T = 10, and one finds cross-sectional correlation in the residuals. Nothing reliable can be done, and usually nothing needs to be done.
I've been so discombobulated lately that I don't keep track of what's in version of papers and what I include in lectures/teaching. So here's an update on what I've learned about DiD in 2022.
1. The pooled OLS method I proposed, which is the same as TWFE and random effects, is also equivalent to a version of imputation I proposed. That means it is consistent for various ATTs under weak assumptions (but those include no anticipation and parallel trends).
2. Because of this equivalence, POLS uses all possible control units in each time period for each cohort. Under standard assumptions, it is efficient.
3. I've only allowed time-constant covariates. But can see the "moderating effects" directly in the regression that gives ATTs.
It's actually more up to date than the latest version of the linear paper.
I've been trying to clean up the Stata do files for both the linear and nonlinear cases. I've learned a lot since last updating -- tricks that make things simpler (in linear and nonlinear cases). I'll pin a new tweet with the Dropbox location.
I'm probably a bit too happy with this paper. It works out elegantly and I think it's also useful. It's also very easy.
The simulations in the paper show how the nonlinear model can matter. The empirical example (common timing) shows it might not.