I think frequentists and Bayesians are not yet on the same page, and it has little to do with philosophy. It seems some Bayesians think a proper response to clustering standard errors is to specify an HLM. But in the linear case, HLM leads to GLS, not OLS.

#metricstotheface
Moreover, a Bayesian would take the HLM structure seriously in all respects: variance and correlation structure and distribution. I'm happy to use an HLM to improve efficiency over pooled estimation, but I would cluster my standard errors, anyway. A Bayesian would not.
There still seems to be a general confusion that fully specifying everything and using a GLS or joint MLE is a costless alternative to pooled methods that use few assumptions. And the Bayesian approach is particular unfair to pooled methods.
One only needs to think of something like a simple time series regression with serial correlation. I think there are four common things one might do.

1. OLS with usual (nonrobust) SEs
2. OLS with Newey-West SEs
3. Prais-Winston with usual SEs
4. P-W with N-W SEs
In my view, choice (3) is almost as bad as (1). Choices (2) and (4) make sense, with (4) requiring strict exogeneity. But at least we're then comparing applies with applies.

Again, what is the Bayesian version of (4) after priors and distributional assumptions are imposed?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

6 Mar
What about the control function approach to estimation? It's a powerful approach for both cross section and panel applications. I'm a fan for sure.

However, the CF approach can impose more assumptions than approaches that use generated IVs.

#metricstotheface
In such cases, we have a clear tradeoff between consistency and efficiency.

In models additive in endogenous explanatory variables with constant coefficients, CF reduces to 2SLS or FE2SLS -- which is neat. Of course, the proof uses Frisch-Waugh.
The equivalence between CF and 2SLS implies a simple, robust specification test of the null that the EEVs are actually exogenous. One can use "robust" or Newey-West or "cluster robust" very easily. The usual Hausman test is not robust, and suffers from degeneracies.
Read 7 tweets
6 Mar
If you teach prob/stats to first-year PhD students, and you want to prepare them to really understand regression, go light on measure theory, counting, combinatorics, distributions. Emphasize conditional expectations, linear projections, convergence results.

@metricstotheface.
This means, of course, law of iterated expectations, law of total variance, best MSE properties of CEs and LPs. How to manipulate Op(1) and op(1). Slutsky's theorem. Convergence in distribution. Asymptotic equivalence lemma. And as much matrix algebra as I know.
If you're like me -- and barely understand basic combinatorics -- you'll also be happier. I get the birthday problem and examples of the law of very large numbers -- and that's about it.
Read 4 tweets
6 Mar
When I teach regression with time series I emphasize that even if we use GLS (say, Prais-Winsten), we should make standard errors robust to serial correlation (and heteroskedasticity). Just like with weighted least squares.

#metricstotheface
I like the phrase "quasi-GLS" to emphasize, in all contexts, we shouldn't take our imposed structure literally. In Stata, it would be nice to allow this:

prais y x1 x2 ... xK, vce(hac nw 4)

vce(robust) is allowed, but it's not enough. The above would be easy to add.
To its credit, Stata does allow

reg y x1 ... xK [aweight = 1/hhat], vce(robust)

to allow our model of heteroskedasticity, as captured by hhat, to be wrong. I've pushed this view in my introductory econometrics book.
Read 4 tweets
3 Mar
I've often wondered why many econometricians seem to have an aversion to row vectors, even when using a row vector simplifies notation.

#metricstotheface
Probably the most common way to write the linear model for a single observation is

y(i) = x(i)'b + u(i)

for a column vector x(i). To me, the prime muddies the waters. For several reasons, I prefer

y(i) = x(i)b + u(i)

for x(i) 1 x k.
It's natural to define x(i) to be the ith row of the data matrix X, especially when visualizing how data are stored.

Plus, insisting x(i) is a column leads to this inelegant formula, where the primes are in different locations:

X'X = Sum(x(i)x(i)')

I feel bad for row vectors.
Read 4 tweets
2 Mar
In a model with binary endogenous w with IVs z,

y = a + b*w + x*g + u
E(u|x,z) = 0

what are the relative merits of usual 2SLS, with a linear first stage for w, versus using a probit for w and then using probit fitted values as IVs in the second stage?

#metricstotheface
Both are consistent under standard identification assumptions. Using a probit first stage could be more efficient. Those are the optimal IVs if (1) Var(u|x,z) is constant and (2) P(w = 1|x,z) = probit. It's consistent without either assumption, just like 2SLS.
As shown by my former student Ruonan Xu, the probit first stage can help with a weak IV problem:

xuruonan.weebly.com/uploads/1/2/7/…

The fitted probit fitted values should be IVs, not regressors. And robust standard errors should be used, as always.
Read 4 tweets
1 Mar
A bit more on clustering. If you observe the entire population and assignment is at the unit level, there is not need to cluster. If assignment is at the group level -- to all units -- cluster at the group level. (Hopefully there are many groups.)

#metricstotheface
I've used the term "ex-post clustering" to describe obsession with clustering just to do it. You don't cluster individual data at the county, state, or regional level just for the heck of it. One must take a stand on the sampling and assignment schemes.
It's easy to see with formulas for estimating the mean from a population. The clustered standard error is too large because of heterogeneity in the means across groups for cluster correlation.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!