Tweet

More from @jmwooldridge

Jeffrey Wooldridge

@jmwooldridge

3 Mar

I've often wondered why many econometricians seem to have an aversion to row vectors, even when using a row vector simplifies notation.

#metricstotheface

Probably the most common way to write the linear model for a single observation is

y(i) = x(i)'b + u(i)

for a column vector x(i). To me, the prime muddies the waters. For several reasons, I prefer

y(i) = x(i)b + u(i)

for x(i) 1 x k.

It's natural to define x(i) to be the ith row of the data matrix X, especially when visualizing how data are stored.

Plus, insisting x(i) is a column leads to this inelegant formula, where the primes are in different locations:

X'X = Sum(x(i)x(i)')

I feel bad for row vectors.

Read 4 tweets

Jeffrey Wooldridge

@jmwooldridge

2 Mar

In a model with binary endogenous w with IVs z,

y = a + b*w + x*g + u
E(u|x,z) = 0

what are the relative merits of usual 2SLS, with a linear first stage for w, versus using a probit for w and then using probit fitted values as IVs in the second stage?

#metricstotheface

Both are consistent under standard identification assumptions. Using a probit first stage could be more efficient. Those are the optimal IVs if (1) Var(u|x,z) is constant and (2) P(w = 1|x,z) = probit. It's consistent without either assumption, just like 2SLS.

As shown by my former student Ruonan Xu, the probit first stage can help with a weak IV problem:

xuruonan.weebly.com/uploads/1/2/7/…

The fitted probit fitted values should be IVs, not regressors. And robust standard errors should be used, as always.

Read 4 tweets

Jeffrey Wooldridge

@jmwooldridge

28 Feb

I've become a believer in always reporting "robust" standard errors. This may seem obvious, but there are nuances. And I'm not talking indiscriminate clustering -- I'll comment on that at some point. Let's start with random sampling from a cross section.

#metricstotheface

We know to use heteroskedasticity-robust standard errors for linear regression automatically. For two reasons:

1. Var(y|x) might depend on x.
2. E(y|x) is wrong and we are only estimating the LP.

Both cause heteroskedasticity.

#metricstotheface

We've discussed quasi-MLE, such as fractional logit and Poisson regression. If E(y|x) is correct, we want standard errors robust to general variance misspecification.

#metricstotheface

Read 9 tweets

Jeffrey Wooldridge

@jmwooldridge

27 Feb

Based on questions I get, it seems there's confusion about choosing between RE and FE in panel data applications. I'm afraid I've contributed. The impression seems to be that if RE "passes" a suitable Hausman test then it should be used. This is false.

I'm trying to emphasize in my teaching that using RE (unless CRE = FE) is an act of desperation. If the FE estimates and the clustered standard errors are "good" (intentionally vague), there's no need to consider RE.

RE is considered when the FE estimates are too imprecise to do much with. With good controls -- say, industry dummies in a firm-level equation -- one might get by with RE. And then choosing between RE and FE makes some sense.

Read 6 tweets

Jeffrey Wooldridge

@jmwooldridge

27 Feb

There are lots of DID experts on here. Dare I go there?

For the T = 2 panel data before/after, we know that linear two-way FE is the same as the DID regression

y on 1, treat, post, treat*post

If y is binary, we should not try "probit fixed effects" -- incidental parameters.

But to estimate ATT, we can also use

gen w = treat*post
probit y treat post i.w, vce(cluster id)
margins, dydx(w)

And we know it's important to use margins in this way. Do not use i.treat#.lpost and compute margins WRT treat.

The above probit is what the Chamberlain correlated RE probit delivers.

As is well known due to Athey & Imbens, the parallel trend assumption is different if we use a linear model versus probit or logit or cloglog or ....

Read 4 tweets

Jeffrey Wooldridge

@jmwooldridge

26 Feb

So we agree that, provided y is the variable of interest -- not censored -- a linear model estimated by OLS is a good starting point. But other functional forms can be better, such as logistic if y is binary or fractional, exponential if y is nonnegative.

In many cases one should include the covariates flexibly -- such as squares and interactions. This is especially true in treatment effect contexts. If w is the treatment, interact it with the controls when estimating the average treatment effect.

@TymonSloczynski

As @TymonSloczynski showed in his elegant 2020 REStat paper, if d is the treatment, just adding x as in the regression y on d, x can produce a badly biased estimate of the ATE. Interacting d and elements of x is generally better. Same is true for nonlinear regression adjustment.

Read 5 tweets

Share this page!

Jeffrey Wooldridge

Try unrolling a thread yourself!

More from @jmwooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Jeffrey Wooldridge

Did Thread Reader help you today?

Like this author's thread?