I've become a believer in always reporting "robust" standard errors. This may seem obvious, but there are nuances. And I'm not talking indiscriminate clustering -- I'll comment on that at some point. Let's start with random sampling from a cross section.

#metricstotheface
We know to use heteroskedasticity-robust standard errors for linear regression automatically. For two reasons:

1. Var(y|x) might depend on x.
2. E(y|x) is wrong and we are only estimating the LP.

Both cause heteroskedasticity.

#metricstotheface
We've discussed quasi-MLE, such as fractional logit and Poisson regression. If E(y|x) is correct, we want standard errors robust to general variance misspecification.

#metricstotheface
Interesting fact: Because flogit and Poisson use the canonical link function, the same robust variance matrix falls out whether the mean is assumed correct or not. (Just like the linear case.) This isn't the case for fprobit, for example.
Now it gets more subtle. If y is binary and I estimate logit, probit, cloglog, should I use "robust" standard errors?

If I do, I'm admitting my model for P(y=1|x) is wrong. It has virtually nothing to do with "heteroskedasticity." It means I think P(y=1|x) is wrong.
I think admitting our models are wrong is healthy, but you must know you're doing it.

probit y x1 ... xk, robust

is not magically consistently estimating coefficients in the presence of heteroskedasticity.
It is still useful to say: like the LPM, my probit model is only an approximation. But it might well approximate APEs -- and could very well be better than linear. I'm going to show robust standard errors in all cases.
Same comments hold for something like Tobit. Any misspecification of the underlying assumptions causes inconsistent estimates of the betas (and sigma). Yet, the estimated Tobit mean function might be a good approximation.
Using vce(robust) with tobit is not producing consistent estimators of the betas in the presence of heteroskedasticity in the latent variable model. And you wouldn't want it to. You want the best approximation to, say, E(y|x) given the model you're using.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

27 Feb
Based on questions I get, it seems there's confusion about choosing between RE and FE in panel data applications. I'm afraid I've contributed. The impression seems to be that if RE "passes" a suitable Hausman test then it should be used. This is false.
I'm trying to emphasize in my teaching that using RE (unless CRE = FE) is an act of desperation. If the FE estimates and the clustered standard errors are "good" (intentionally vague), there's no need to consider RE.
RE is considered when the FE estimates are too imprecise to do much with. With good controls -- say, industry dummies in a firm-level equation -- one might get by with RE. And then choosing between RE and FE makes some sense.
Read 6 tweets
27 Feb
There are lots of DID experts on here. Dare I go there?

For the T = 2 panel data before/after, we know that linear two-way FE is the same as the DID regression

y on 1, treat, post, treat*post

If y is binary, we should not try "probit fixed effects" -- incidental parameters.
But to estimate ATT, we can also use

gen w = treat*post
probit y treat post i.w, vce(cluster id)
margins, dydx(w)

And we know it's important to use margins in this way. Do not use i.treat#.lpost and compute margins WRT treat.
The above probit is what the Chamberlain correlated RE probit delivers.

As is well known due to Athey & Imbens, the parallel trend assumption is different if we use a linear model versus probit or logit or cloglog or ....
Read 4 tweets
26 Feb
So we agree that, provided y is the variable of interest -- not censored -- a linear model estimated by OLS is a good starting point. But other functional forms can be better, such as logistic if y is binary or fractional, exponential if y is nonnegative.
In many cases one should include the covariates flexibly -- such as squares and interactions. This is especially true in treatment effect contexts. If w is the treatment, interact it with the controls when estimating the average treatment effect.
As @TymonSloczynski showed in his elegant 2020 REStat paper, if d is the treatment, just adding x as in the regression y on d, x can produce a badly biased estimate of the ATE. Interacting d and elements of x is generally better. Same is true for nonlinear regression adjustment.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!