Not sure about that! But here's a first attempt. Suppose I have a control group and G treatment levels. The treatment, W, is in {0,1,2,...,G} is unconfounded conditional on X. Assume the overlap condition 0 < p0(x) = P(W=0|X=x) for all x in Support(X).
This isn't a trivial assumption b/c it requires that for and subset of the population as determined by values of x, there are some control units. However, if this isn't true, one can trim the sample -- as in the Crump et al. "Moving the Goalposts" work.
If overlap holds and conditional means are linear, the following regression recovers the ATTs of each group g relative to control:
Y on 1, W1, W2, ... WG, X, W1*(X - Xbar1), W2*(X - Xbar2), ..., WG*(X - XbarG) where Xbarg is the sample average of treatment group g.
If we don't like linear regression, replace with logit if Y is binary or fractional; multinomial if Y is multinomial or fractional with more than two outcomes (recent paper with Akanksha Negi). If Y is nonnegative (count, corner) use Poisson.
My view is we can estimate these ATTs very generally and these parameters are interesting. This is very similar to what underlies my extended TWFE DiD work. We have to worry about overlap, but having to trim out parts of the population with no controls is not surprising.
The previous regression adjustment is, of course, different from
Plus, the full (separate) RA methods extend to doubly robust estimators that combine separate linear/logit/MNL/Poisson means with inverse probability weighting (obtained via MNL).
I certainly hope labor economists don't abandon these methods.
BTW, as a practical matter, I don't think Stata's teffects supports estimation of the ATTs. It provides estimates of the ATEs E[Y(g) - Y(0)], and this requires both stronger unconfoundedness and overlap assumptions. But the regressions are easy to do "by hand."
Standard errors that condition on the covariates follow immediately, and it might be possible to use the vce(uncond) option to account for sampling error in the Xbarg.
BTW, I can recommend the paper by @TymonSloczynski and me in Econometric Theory (2018) for the doubly robust stuff. 😬
• • •
Missing some Tweet in this thread? You can try to
force a refresh
If in a staggered DiD setting I write an equation with a full set of treatment indicators by treated cohort and calendar time, and include c(i) + f(t) (unit and time "fixed effects"), would you still call that a "fixed effects" model?
If you answer "yes" then you should stop saying things like "there's a problem with the TWFE 'model'." The modeling is our choice; we choose what to put in x(i,t) when we write
y(i,t) = x(i,t)*b + c(i) + f(t) + u(i,t)
The phrase "TWFE model" refers to c(i) + f(t), right?
If x(i,t) = w(i,t) -- a single treatment indicator -- then the model might be too restrictive. But as I've shown in my DiD work, it's easy to put more in x(i,t) and estimate a full set of heterogeneous TEs. But I can (and should) still use the TWFE estimator.
Not exactly. I like Bruce's approach in this paper and it yields nice insights. But in twitter and private exchanges last week, and what I've learned since, it seems that the class of estimators in play in Theorem 5 include only estimators that are linear in Y.
Theorem 5 is correct and neat, but leaves open the question of which estimators are in the class that is being compared with OLS. Remember, we cannot simply use phrases such as "OLS is BUE" without clearly defining the competing class of estimators. This is critical.
The class of distributions in F2 is so large -- only restricting the mean to be linear in X and assuming finite second moments -- that it's not surprising the class of unbiased estimators is "small." So small, it is estimators linear in Y.
Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.
A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.
If one looks at Theorem 5 concerning OLS, you'll see a distinction between F2 and F2^0. All estimators in the comparison group must be unbiased under the very large class of distributions, F2. This includes all distributions with finite second moments -- so unrestricted SIGMA.
This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible
It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.
The Koopmann result prompts a question that I've wondered about off and on. If you use the first 3 GM assumptions, which I write as
A1. Y = X*b + U
A2. rank(X) = k
A3. E(U|X) = 0
then, for A n x k, a linear estimator A'Y is unbiased if and only if A'X = I (n x n).
One of the remarkable features of Bruce's result, and why I never could have discovered it, is that the "asymptotic" analog doesn't seem to hold. Suppose we assume random sampling and in the population specify
Also assume rank E(x'x) = k so no perfect collinearity in the population. Then OLS is asymptotically efficient among estimators that only use A1 for consistency. But OLS is not asymp effic among estimators that use A1 and A2 for consistency.
A2 adds many extra moment conditions that, generally, are useful for estimating b0 -- for example, if D(y|x) is asymmetric with third central moment depending on x. So there are GMM estimators more asymp efficient than OLS under A1 and A2.
Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).
Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.
Now, the prices were generated to be highly correlated with, corr = 0.83. Not VIF > 10 territory but a pretty high correlation. If PR is dropped from the equation for QE, the estimated price effect for PE falls dramatically -- because there's an important omitted variable, PR.