Much focus on Poisson regression (whether for cross section or FE Poisson for panel data) is on its consistency when the conditional mean (almost always assumed to be exponential) is correctly specified. This is its most important feature.
A less well known but very important feature is its relative efficiency in the class of robust estimators -- that is, estimators consistent when only the mean is correct. (This requirement rules out MLEs of lots of models, such as NegBin I and NegBin II.)
The efficiency claims are satisfying. In the random sampling case, Poisson QMLE is efficient in the class of robust estimators if the variance is proportional to the mean. The constant can be less than one or greater than one; it doesn't matter. It doesn't have to equal one!
In that scenario, Poisson QMLE is more efficient than nonlinear LS, geometric QMLE, gamma QMLE. Note it includes the NegBin I variance (but doesn't estimate the scale parameter along with beta!). If the variance is proportional to square of mean, gamma QMLE is more efficient.
I discuss the above in my MIT Press book.
In a recent paper with my former student Nick Brown, we proved the extension for the panel data case under "fixed effects" assumptions. That is, we allow heterogeneity arbitrarily correlated with the covariates.
If Var(y(it)|x(i),c(i)) is proportional to E(y(it)|x(i),c(i)) and the y(i,t) are uncorrelated conditional on (x(i),c(i)), the FE Poisson estimator achieves the efficiency bound. So far, reviewers haven't recognized the practical and historical significance of this result. 😬
Previously, efficiency was known when variance equals the mean. Our result allows for under- or over-dispersion (any amoung) if it's proportional to the mean.
You may ask: Can one improve over the FE Poisson estimator if there's serial correlation or more complicated variances?
The answer is yes, and Nick and I have a nifty way of doing that. And relaxing the variance assumptions. All while maintaining the same level of robustness as the Poisson QMLE. The new GMM that stacks moment conditions does well in simulations, too.
On the one hand, our paper provides more reasons to use FE Poisson. On the other, we show how it's easy to improve on it when there's serial correlation.
Comments are most welcome. Hopefully, Nick and I will be submitting to journal #N just into the new year.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
To people who badger empirical researchers using micro-type panel data -- where N is pretty large and T is not -- into computing tests for cross-sectional dependence in the errors: Please stop!
These tests give lots of false positives due to unobserved heterogeneity.
This is essentially like testing for cluster correlation using residuals after OLS. Even under random sampling and random assignment -- where we know clustering is not needed -- tests for cluster correlation tests will often reject if there is neglected heterogeneity.
I've gotten a few emails asking me what should be done when, say, N = 3,000, T = 10, and one finds cross-sectional correlation in the residuals. Nothing reliable can be done, and usually nothing needs to be done.
I've been so discombobulated lately that I don't keep track of what's in version of papers and what I include in lectures/teaching. So here's an update on what I've learned about DiD in 2022.
1. The pooled OLS method I proposed, which is the same as TWFE and random effects, is also equivalent to a version of imputation I proposed. That means it is consistent for various ATTs under weak assumptions (but those include no anticipation and parallel trends).
2. Because of this equivalence, POLS uses all possible control units in each time period for each cohort. Under standard assumptions, it is efficient.
3. I've only allowed time-constant covariates. But can see the "moderating effects" directly in the regression that gives ATTs.
It's actually more up to date than the latest version of the linear paper.
I've been trying to clean up the Stata do files for both the linear and nonlinear cases. I've learned a lot since last updating -- tricks that make things simpler (in linear and nonlinear cases). I'll pin a new tweet with the Dropbox location.
I'm probably a bit too happy with this paper. It works out elegantly and I think it's also useful. It's also very easy.
The simulations in the paper show how the nonlinear model can matter. The empirical example (common timing) shows it might not.
A problem with specification testing is that it can lead those who are inexperienced to think that empirical work is mostly about applying a slew of specification tests to a particular model and then trying to sort out the findings.
This is apparent with linear panel data models, where one sees the Breusch-Pagan test used to choose between POLS and RE; the F test of the unit-specific dummies to choose between POLS and FE; and the Hausman test to choose between RE and FE.
One generic problem is that the default for each test is a nonrobust version. The first two actually maintain normality (although that can be relaxed). While a robust BP test exists -- I derive it in my MIT Press book -- the test doesn't tell us much.
Not sure about that! But here's a first attempt. Suppose I have a control group and G treatment levels. The treatment, W, is in {0,1,2,...,G} is unconfounded conditional on X. Assume the overlap condition 0 < p0(x) = P(W=0|X=x) for all x in Support(X).
This isn't a trivial assumption b/c it requires that for and subset of the population as determined by values of x, there are some control units. However, if this isn't true, one can trim the sample -- as in the Crump et al. "Moving the Goalposts" work.
If overlap holds and conditional means are linear, the following regression recovers the ATTs of each group g relative to control:
Y on 1, W1, W2, ... WG, X, W1*(X - Xbar1), W2*(X - Xbar2), ..., WG*(X - XbarG) where Xbarg is the sample average of treatment group g.