Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.

#metricstotheface
A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.
If one looks at Theorem 5 concerning OLS, you'll see a distinction between F2 and F2^0. All estimators in the comparison group must be unbiased under the very large class of distributions, F2. This includes all distributions with finite second moments -- so unrestricted SIGMA.
Because the LPQ estimator uses Var(Y|X) = (s^2)*I (which holds under GM), these estimators are biased for many distributions in LPQ. In fact, I'm pretty sure that the unbiasedness condition for LPQ when SIGMA can be anything is that H(j) = 0 in GKTZ (1992)'s notation.
This closely relates to my discussion of asymptotics yesterday, and now I see there is no conflict. It's somewhat subtle. If we only use the conditional mean assumption in estimation and homosk happens to be true, OLS is efficient -- in a restricted class of estimators.
OLS uses the optimal IVs based on E(y|x) = x*b if Var(y|x) happens to be constant. Using only the conditional mean restrictions mean we allow for distributions that include heterosk. Estimators that use homosk to estimate beta are generally inconsistent.
If we are allowed to use the homosk assump to estimate beta then we can do better than OLS (at least under asymmetry).

The same thing is happening in Bruce's exact result. One can only use E(Y|X) = X*b in obtaining unbiased estimators because the class F2 is large.
Then, if it happens that Var(Y|X) is scalar, OLS is best unbiased in the class of estimators. But it's not BU if one uses Var(Y|X) = (s^2)*I in estimating beta. The Koopmann and GKTZ papers show that.
There remains an interesting question: Does the class of estimators unbiased under F2 include any nonlinear estimators? It seems it doesn't include any LPQ estimators; if that's the case, it makes me think they're all linear estimators.
Another way to understand this is that if F2 is replaced with F2^0 in Theorem 5 then the theorem is no longer true; there are LPQ estimators that outperform OLS. Using F2^0 is smaller than F2 and so expands the class of unbiased estimators in competition with OLS.
I'm relieved because it would be very odd for the exact result to be so at odds with the asymptotic result that I teach to first-year PhD students. Now I know they're entirely analogous.
For the asymptotic result, I don't know of any non-silly estimators that wouldn't have the form (linear in the y(i)),

Sum(i=1,...,n) g(x(i))'*y(i)

for some functions g().
The second “LPQ” should be “F2.”

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

Feb 13
This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible
It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.
The Koopmann result prompts a question that I've wondered about off and on. If you use the first 3 GM assumptions, which I write as
A1. Y = X*b + U
A2. rank(X) = k
A3. E(U|X) = 0
then, for A n x k, a linear estimator A'Y is unbiased if and only if A'X = I (n x n).
Read 4 tweets
Feb 12
One of the remarkable features of Bruce's result, and why I never could have discovered it, is that the "asymptotic" analog doesn't seem to hold. Suppose we assume random sampling and in the population specify

A1. E(y|x) = x*b0
A2. Var(y|x) = (s0)^2

#metricstotheface
Also assume rank E(x'x) = k so no perfect collinearity in the population. Then OLS is asymptotically efficient among estimators that only use A1 for consistency. But OLS is not asymp effic among estimators that use A1 and A2 for consistency.
A2 adds many extra moment conditions that, generally, are useful for estimating b0 -- for example, if D(y|x) is asymmetric with third central moment depending on x. So there are GMM estimators more asymp efficient than OLS under A1 and A2.
Read 5 tweets
Jan 18
Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).

#metricstotheface
Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.
Now, the prices were generated to be highly correlated with, corr = 0.83. Not VIF > 10 territory but a pretty high correlation. If PR is dropped from the equation for QE, the estimated price effect for PE falls dramatically -- because there's an important omitted variable, PR.
Read 5 tweets
Jan 18
If you know people who teach students it's important to "test" for multicollinearity, please ask them why.

I imagine a world where the phrase "I tested for multicollinearity" no longer appears in published work. I know John Lennon would be on my side.

#metricstotheface
What I'm getting at is that it's still common to see "tests" for multicollinearity without even looking at the regression output. Or asking which variables are collinear. Often it's control variables. So what? If you have many control variables you might have to select.
And a VIF of 9.99 is okay but 10.01 is a disaster? We can do better than this across all fields.

I just saw a post where X1 and X2 have a correlation of .7, and the researcher wonders which variable to drop.
Read 6 tweets
Nov 6, 2021
A Twitter primer on the canonical link the linear exponential family. I've used this combination in a few of my papers: the doubly robust estimators for estimating average treatment effects, improving efficiency in RCTs, and, most recently, nonlinear DiD.

#metricstotheface
The useful CL/LEF combinations are:
1. linear mean/normal
2. logistic mean/Bernoulli (binary fractional)
3. logistic mean/binomial (0 <= Y <= M)
4. exponential mean/Poisson (Y >= 0)
5. logistic means/multinomial

The last isn't used very much -- yet.
The key statistical feature of the CL/LEF combinations is that the first order conditions look like those for OLS (combination 1). The residuals add to zero and each covariate is uncorrelated with the residuals in sample. Residuals are uhat(i) y(i) - mhat(x(i)).
Read 5 tweets
Oct 17, 2021
Because of a recent post at Data Colada, I've been asked about my take on the various heterosk-robust standard errors. In the taxonomy of MacKinnon-White and Davidson-MacKinnon, there's HC0, HC1, HC2, HC3.

#metricstotheface

datacolada.org/99
HC0 was the original variance matrix estimator proposed in White (1980, Econometrica). HC1 = [n/(n-k)]*HC0 makes a simple df adjustment. Clearly, HC1 - HC0 is positive semi-definite (even PD).
HC2 divides the squared resids, u^(i)^2, by 1 - h(i,i) where the h(i,i) are diag elements from the "hat" or projection matrix. It can be shown that this produces n different unbiased estimators of sigma^2 under homoskedasticity.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

:(