One of the remarkable features of Bruce's result, and why I never could have discovered it, is that the "asymptotic" analog doesn't seem to hold. Suppose we assume random sampling and in the population specify

A1. E(y|x) = x*b0
A2. Var(y|x) = (s0)^2

#metricstotheface
Also assume rank E(x'x) = k so no perfect collinearity in the population. Then OLS is asymptotically efficient among estimators that only use A1 for consistency. But OLS is not asymp effic among estimators that use A1 and A2 for consistency.
A2 adds many extra moment conditions that, generally, are useful for estimating b0 -- for example, if D(y|x) is asymmetric with third central moment depending on x. So there are GMM estimators more asymp efficient than OLS under A1 and A2.
With asymp analysis, we see a clear tradeoff between robustness and effic.

Perhaps my choice of asymptotic analogy isn't a good one, or the optimal IVs for b0 collapse to OLS (harder for me to believe). In any case, I still have more to learn ....
BTW, one would never have used the expanded class of nonlinear in Y estimators because they're generally biased (and inconsistent) if we only impose E(e|X) = 0. But it's very neat to know that they're inefficient relative to OLS even with using stronger assumps for unbiasedness.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

Feb 13
Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.

#metricstotheface
A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.
If one looks at Theorem 5 concerning OLS, you'll see a distinction between F2 and F2^0. All estimators in the comparison group must be unbiased under the very large class of distributions, F2. This includes all distributions with finite second moments -- so unrestricted SIGMA.
Read 13 tweets
Feb 13
This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible
It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.
The Koopmann result prompts a question that I've wondered about off and on. If you use the first 3 GM assumptions, which I write as
A1. Y = X*b + U
A2. rank(X) = k
A3. E(U|X) = 0
then, for A n x k, a linear estimator A'Y is unbiased if and only if A'X = I (n x n).
Read 4 tweets
Jan 18
Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).

#metricstotheface
Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.
Now, the prices were generated to be highly correlated with, corr = 0.83. Not VIF > 10 territory but a pretty high correlation. If PR is dropped from the equation for QE, the estimated price effect for PE falls dramatically -- because there's an important omitted variable, PR.
Read 5 tweets
Jan 18
If you know people who teach students it's important to "test" for multicollinearity, please ask them why.

I imagine a world where the phrase "I tested for multicollinearity" no longer appears in published work. I know John Lennon would be on my side.

#metricstotheface
What I'm getting at is that it's still common to see "tests" for multicollinearity without even looking at the regression output. Or asking which variables are collinear. Often it's control variables. So what? If you have many control variables you might have to select.
And a VIF of 9.99 is okay but 10.01 is a disaster? We can do better than this across all fields.

I just saw a post where X1 and X2 have a correlation of .7, and the researcher wonders which variable to drop.
Read 6 tweets
Nov 6, 2021
A Twitter primer on the canonical link the linear exponential family. I've used this combination in a few of my papers: the doubly robust estimators for estimating average treatment effects, improving efficiency in RCTs, and, most recently, nonlinear DiD.

#metricstotheface
The useful CL/LEF combinations are:
1. linear mean/normal
2. logistic mean/Bernoulli (binary fractional)
3. logistic mean/binomial (0 <= Y <= M)
4. exponential mean/Poisson (Y >= 0)
5. logistic means/multinomial

The last isn't used very much -- yet.
The key statistical feature of the CL/LEF combinations is that the first order conditions look like those for OLS (combination 1). The residuals add to zero and each covariate is uncorrelated with the residuals in sample. Residuals are uhat(i) y(i) - mhat(x(i)).
Read 5 tweets
Oct 17, 2021
Because of a recent post at Data Colada, I've been asked about my take on the various heterosk-robust standard errors. In the taxonomy of MacKinnon-White and Davidson-MacKinnon, there's HC0, HC1, HC2, HC3.

#metricstotheface

datacolada.org/99
HC0 was the original variance matrix estimator proposed in White (1980, Econometrica). HC1 = [n/(n-k)]*HC0 makes a simple df adjustment. Clearly, HC1 - HC0 is positive semi-definite (even PD).
HC2 divides the squared resids, u^(i)^2, by 1 - h(i,i) where the h(i,i) are diag elements from the "hat" or projection matrix. It can be shown that this produces n different unbiased estimators of sigma^2 under homoskedasticity.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

:(