Tweet

Jeffrey Wooldridge

Jun 10 • 10 tweets • 3 min read

https://twitter.com/himbonomics/status/1535056821584465920

Not sure about that! But here's a first attempt. Suppose I have a control group and G treatment levels. The treatment, W, is in {0,1,2,...,G} is unconfounded conditional on X. Assume the overlap condition 0 < p0(x) = P(W=0|X=x) for all x in Support(X).

https://twitter.com/himbonomics/status/1535056821584465920

This isn't a trivial assumption b/c it requires that for and subset of the population as determined by values of x, there are some control units. However, if this isn't true, one can trim the sample -- as in the Crump et al. "Moving the Goalposts" work.

If overlap holds and conditional means are linear, the following regression recovers the ATTs of each group g relative to control:

Y on 1, W1, W2, ... WG, X, W1*(X - Xbar1), W2*(X - Xbar2), ..., WG*(X - XbarG) where Xbarg is the sample average of treatment group g.

If we don't like linear regression, replace with logit if Y is binary or fractional; multinomial if Y is multinomial or fractional with more than two outcomes (recent paper with Akanksha Negi). If Y is nonnegative (count, corner) use Poisson.

My view is we can estimate these ATTs very generally and these parameters are interesting. This is very similar to what underlies my extended TWFE DiD work. We have to worry about overlap, but having to trim out parts of the population with no controls is not surprising.

@TymonSloczynski

The previous regression adjustment is, of course, different from

Y on 1, W1, W2, ..., WG, X

and we know from @TymonSloczynski, the DiD literature, and now @paulgp, @instrumenthull, and Kolesar that this can be very misleading.

Plus, the full (separate) RA methods extend to doubly robust estimators that combine separate linear/logit/MNL/Poisson means with inverse probability weighting (obtained via MNL).

I certainly hope labor economists don't abandon these methods.

#metricstotheface

BTW, as a practical matter, I don't think Stata's teffects supports estimation of the ATTs. It provides estimates of the ATEs E[Y(g) - Y(0)], and this requires both stronger unconfoundedness and overlap assumptions. But the regressions are easy to do "by hand."

Standard errors that condition on the covariates follow immediately, and it might be possible to use the vce(uncond) option to account for sampling error in the Xbarg.

@TymonSloczynski

BTW, I can recommend the paper by @TymonSloczynski and me in Econometric Theory (2018) for the doubly robust stuff. 😬

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @jmwooldridge

Jeffrey Wooldridge

@jmwooldridge

Apr 20

If in a staggered DiD setting I write an equation with a full set of treatment indicators by treated cohort and calendar time, and include c(i) + f(t) (unit and time "fixed effects"), would you still call that a "fixed effects" model?

If you answer "yes" then you should stop saying things like "there's a problem with the TWFE 'model'." The modeling is our choice; we choose what to put in x(i,t) when we write

y(i,t) = x(i,t)*b + c(i) + f(t) + u(i,t)

The phrase "TWFE model" refers to c(i) + f(t), right?

If x(i,t) = w(i,t) -- a single treatment indicator -- then the model might be too restrictive. But as I've shown in my DiD work, it's easy to put more in x(i,t) and estimate a full set of heterogeneous TEs. But I can (and should) still use the TWFE estimator.

Read 6 tweets

Jeffrey Wooldridge

@jmwooldridge

Feb 18

https://twitter.com/BorelliLuan/status/1494719603821449224

Not exactly. I like Bruce's approach in this paper and it yields nice insights. But in twitter and private exchanges last week, and what I've learned since, it seems that the class of estimators in play in Theorem 5 include only estimators that are linear in Y.

#metricstotheface

https://twitter.com/BorelliLuan/status/1494719603821449224

Theorem 5 is correct and neat, but leaves open the question of which estimators are in the class that is being compared with OLS. Remember, we cannot simply use phrases such as "OLS is BUE" without clearly defining the competing class of estimators. This is critical.

The class of distributions in F2 is so large -- only restricting the mean to be linear in X and assuming finite second moments -- that it's not surprising the class of unbiased estimators is "small." So small, it is estimators linear in Y.

Read 11 tweets

Jeffrey Wooldridge

@jmwooldridge

Feb 13

@BruceEHansen

Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.

#metricstotheface

A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.

If one looks at Theorem 5 concerning OLS, you'll see a distinction between F2 and F2^0. All estimators in the comparison group must be unbiased under the very large class of distributions, F2. This includes all distributions with finite second moments -- so unrestricted SIGMA.

Read 13 tweets

Jeffrey Wooldridge

@jmwooldridge

Feb 13

https://twitter.com/lihua_lei_stat/status/1492629147759218690

This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible

https://twitter.com/lihua_lei_stat/status/1492629147759218690

It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.

The Koopmann result prompts a question that I've wondered about off and on. If you use the first 3 GM assumptions, which I write as
A1. Y = X*b + U
A2. rank(X) = k
A3. E(U|X) = 0
then, for A n x k, a linear estimator A'Y is unbiased if and only if A'X = I (n x n).

Read 4 tweets

Jeffrey Wooldridge

@jmwooldridge

Feb 12

https://twitter.com/CavaliereGiu/status/1492454733948362753

One of the remarkable features of Bruce's result, and why I never could have discovered it, is that the "asymptotic" analog doesn't seem to hold. Suppose we assume random sampling and in the population specify

A1. E(y|x) = x*b0
A2. Var(y|x) = (s0)^2

#metricstotheface

https://twitter.com/CavaliereGiu/status/1492454733948362753

Also assume rank E(x'x) = k so no perfect collinearity in the population. Then OLS is asymptotically efficient among estimators that only use A1 for consistency. But OLS is not asymp effic among estimators that use A1 and A2 for consistency.

A2 adds many extra moment conditions that, generally, are useful for estimating b0 -- for example, if D(y|x) is asymmetric with third central moment depending on x. So there are GMM estimators more asymp efficient than OLS under A1 and A2.

Read 5 tweets

Jeffrey Wooldridge

@jmwooldridge

Jan 18

https://twitter.com/jmwooldridge/status/1483493723233259527

Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).

#metricstotheface

https://twitter.com/jmwooldridge/status/1483493723233259527

Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.

Now, the prices were generated to be highly correlated with, corr = 0.83. Not VIF > 10 territory but a pretty high correlation. If PR is dropped from the equation for QE, the estimated price effect for PE falls dramatically -- because there's an important omitted variable, PR.

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Jeffrey Wooldridge

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @jmwooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?