Tweet

Jeffrey Wooldridge

Feb 18 • 11 tweets • 3 min read

https://twitter.com/BorelliLuan/status/1494719603821449224

Not exactly. I like Bruce's approach in this paper and it yields nice insights. But in twitter and private exchanges last week, and what I've learned since, it seems that the class of estimators in play in Theorem 5 include only estimators that are linear in Y.

#metricstotheface

https://twitter.com/BorelliLuan/status/1494719603821449224

Theorem 5 is correct and neat, but leaves open the question of which estimators are in the class that is being compared with OLS. Remember, we cannot simply use phrases such as "OLS is BUE" without clearly defining the competing class of estimators. This is critical.

The class of distributions in F2 is so large -- only restricting the mean to be linear in X and assuming finite second moments -- that it's not surprising the class of unbiased estimators is "small." So small, it is estimators linear in Y.

@lihua_lei_stat

Stephen Portnoy, @lihua_lei_stat, and I have independently come to this conclusion (I've had exchanges with both and with Bruce). So the statement in Theorem 5 concerning the class of estimators is the same as saying estimators linear in Y.

If we entertain estimators that are unbiased when the full GM assumptions are used -- so that Var(Y|X) = (s^2)*I -- then OLS is not best unbiased. Interestingly, when we add normality, by the C-R lower bound, shows OLS is best unbiased in a very large class of estimators.

This goes to show that efficiency is not a "monotonic" function as we relax restrictions on the class of estimators.
To show my summary is wrong, you need to find a nonlinear estimator that is unbiased for b for any distribution in F2. Portnoy has shown it's not possible.

@lihua_lei_stat

@lihua_lei_stat and I have argued the same thing using results mentioned here:

https://twitter.com/lihua_lei_stat/status/1493291015129550849

https://twitter.com/jmwooldridge/status/1492990971218440197

Here's a more succinct way to state my conclusion: 1. OLS is BUE in the class of unbiased estimators E2, defined by the the distributions F2 in Hansen. 2. E2 includes only linear estimators.

But this is what we mean when we say OLS is BLUE.

In the original GM Theorem, the class of estimators is explicitly stated to be linear and unbiased. In Hansen Theorem 5, the class of estimators is implicitly linear and unbiased. But the conclusions about OLS are the same.

Here's another observation that may help. We must separate the class of estimators we're willing to entertain from the assumptions under which we claim efficiency. It's easy to confound them. It's desirable to have estimators unbiased under A1 only.

Such estimators are necessarily linear. Then, OLS is best in this class under Assumptions A1 and A2. If we want to consider estimators unbiased under A1, A2, and A3 then there is no ambiguity: OLS is best in this class under A1, A2, and A3.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @jmwooldridge

Jeffrey Wooldridge

@jmwooldridge

Feb 13

@BruceEHansen

Concerning the recent exchange many of us had about @BruceEHansen's new Gauss-Markov Theorem, I now understand a lot more and can correct/clarify several things I wrote yesterday. I had a helpful email exchange with Bruce that confirmed my thinking.

#metricstotheface

A lot was written about the "linear plus quadratic" class of estimators as possible competitors to OLS. Here's something important to know: Bruce's result does not allow these estimators in the comparison group with OLS unless they are actually linear; no quadratic terms allowed.

If one looks at Theorem 5 concerning OLS, you'll see a distinction between F2 and F2^0. All estimators in the comparison group must be unbiased under the very large class of distributions, F2. This includes all distributions with finite second moments -- so unrestricted SIGMA.

Read 13 tweets

Jeffrey Wooldridge

@jmwooldridge

Feb 13

https://twitter.com/lihua_lei_stat/status/1492629147759218690

This is neat and makes sense to me. After all, third moments need not even exist under GM. And using 3rd moments would make it very tough to achieve unbiasedness across all cases with only GM. Clearly, the result says it's impossible

https://twitter.com/lihua_lei_stat/status/1492629147759218690

It still blows my mind that that OLS is best unbiased in that class. Across all multivariate distributions with weird 3rd and 4th conditional moments, and beyond. As I said in a previous tweet, this would not be true in an asymptotic setting.

The Koopmann result prompts a question that I've wondered about off and on. If you use the first 3 GM assumptions, which I write as
A1. Y = X*b + U
A2. rank(X) = k
A3. E(U|X) = 0
then, for A n x k, a linear estimator A'Y is unbiased if and only if A'X = I (n x n).

Read 4 tweets

Jeffrey Wooldridge

@jmwooldridge

Feb 12

https://twitter.com/CavaliereGiu/status/1492454733948362753

One of the remarkable features of Bruce's result, and why I never could have discovered it, is that the "asymptotic" analog doesn't seem to hold. Suppose we assume random sampling and in the population specify

A1. E(y|x) = x*b0
A2. Var(y|x) = (s0)^2

#metricstotheface

https://twitter.com/CavaliereGiu/status/1492454733948362753

Also assume rank E(x'x) = k so no perfect collinearity in the population. Then OLS is asymptotically efficient among estimators that only use A1 for consistency. But OLS is not asymp effic among estimators that use A1 and A2 for consistency.

A2 adds many extra moment conditions that, generally, are useful for estimating b0 -- for example, if D(y|x) is asymmetric with third central moment depending on x. So there are GMM estimators more asymp efficient than OLS under A1 and A2.

Read 5 tweets

Jeffrey Wooldridge

@jmwooldridge

Jan 18

https://twitter.com/jmwooldridge/status/1483493723233259527

Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).

#metricstotheface

https://twitter.com/jmwooldridge/status/1483493723233259527

Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.

Now, the prices were generated to be highly correlated with, corr = 0.83. Not VIF > 10 territory but a pretty high correlation. If PR is dropped from the equation for QE, the estimated price effect for PE falls dramatically -- because there's an important omitted variable, PR.

Read 5 tweets

Jeffrey Wooldridge

@jmwooldridge

Jan 18

If you know people who teach students it's important to "test" for multicollinearity, please ask them why.

I imagine a world where the phrase "I tested for multicollinearity" no longer appears in published work. I know John Lennon would be on my side.

#metricstotheface

What I'm getting at is that it's still common to see "tests" for multicollinearity without even looking at the regression output. Or asking which variables are collinear. Often it's control variables. So what? If you have many control variables you might have to select.

And a VIF of 9.99 is okay but 10.01 is a disaster? We can do better than this across all fields.

I just saw a post where X1 and X2 have a correlation of .7, and the researcher wonders which variable to drop.

Read 6 tweets

Jeffrey Wooldridge

@jmwooldridge

Nov 6, 2021

A Twitter primer on the canonical link the linear exponential family. I've used this combination in a few of my papers: the doubly robust estimators for estimating average treatment effects, improving efficiency in RCTs, and, most recently, nonlinear DiD.

#metricstotheface

The useful CL/LEF combinations are:
1. linear mean/normal
2. logistic mean/Bernoulli (binary fractional)
3. logistic mean/binomial (0 <= Y <= M)
4. exponential mean/Poisson (Y >= 0)
5. logistic means/multinomial

The last isn't used very much -- yet.

The key statistical feature of the CL/LEF combinations is that the first order conditions look like those for OLS (combination 1). The residuals add to zero and each covariate is uncorrelated with the residuals in sample. Residuals are uhat(i) y(i) - mhat(x(i)).

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Jeffrey Wooldridge

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @jmwooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?