If you know people who teach students it's important to "test" for multicollinearity, please ask them why.

I imagine a world where the phrase "I tested for multicollinearity" no longer appears in published work. I know John Lennon would be on my side.

#metricstotheface
What I'm getting at is that it's still common to see "tests" for multicollinearity without even looking at the regression output. Or asking which variables are collinear. Often it's control variables. So what? If you have many control variables you might have to select.
And a VIF of 9.99 is okay but 10.01 is a disaster? We can do better than this across all fields.

I just saw a post where X1 and X2 have a correlation of .7, and the researcher wonders which variable to drop.
The coefs on X1 and X2 make sense but the coef on X2 flips sign when X1 is dropped. Many doing serious data analysis are still learning about the basic idea of ceteris paribus. In interesting applications, the Xj are often highly correlated.
In the application, it's clear both X1 and X2 should be included, but the researcher has somehow learned that "highly" correlated variables can't appear in the same model. We must do better in our teaching (and I must do better with my writing).
Starting to feel a little sorry for my first-year PhD students, who will be getting the full multicollinearity/VIF rant in the next two weeks. 😈

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jeffrey Wooldridge

Jeffrey Wooldridge Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jmwooldridge

Jan 18
Here's an example I use in the summer ESTIMATE course at MSU. It's based on an actual contingent valuation survey. There are two prices, one of regular apples the other of "ecologically friendly" apples. The prices were randomly assigned as a pair, (PR, PE).

#metricstotheface
Individuals were then asked to choose a basket of regular and eco-friendly applies. A linear regression for QE (quantity of eco-labeled) gives very good results: strong downward sloping demand curve, an increase in the competing price shifts out the demand curve.
Now, the prices were generated to be highly correlated with, corr = 0.83. Not VIF > 10 territory but a pretty high correlation. If PR is dropped from the equation for QE, the estimated price effect for PE falls dramatically -- because there's an important omitted variable, PR.
Read 5 tweets
Nov 6, 2021
A Twitter primer on the canonical link the linear exponential family. I've used this combination in a few of my papers: the doubly robust estimators for estimating average treatment effects, improving efficiency in RCTs, and, most recently, nonlinear DiD.

#metricstotheface
The useful CL/LEF combinations are:
1. linear mean/normal
2. logistic mean/Bernoulli (binary fractional)
3. logistic mean/binomial (0 <= Y <= M)
4. exponential mean/Poisson (Y >= 0)
5. logistic means/multinomial

The last isn't used very much -- yet.
The key statistical feature of the CL/LEF combinations is that the first order conditions look like those for OLS (combination 1). The residuals add to zero and each covariate is uncorrelated with the residuals in sample. Residuals are uhat(i) y(i) - mhat(x(i)).
Read 5 tweets
Oct 17, 2021
Because of a recent post at Data Colada, I've been asked about my take on the various heterosk-robust standard errors. In the taxonomy of MacKinnon-White and Davidson-MacKinnon, there's HC0, HC1, HC2, HC3.

#metricstotheface

datacolada.org/99
HC0 was the original variance matrix estimator proposed in White (1980, Econometrica). HC1 = [n/(n-k)]*HC0 makes a simple df adjustment. Clearly, HC1 - HC0 is positive semi-definite (even PD).
HC2 divides the squared resids, u^(i)^2, by 1 - h(i,i) where the h(i,i) are diag elements from the "hat" or projection matrix. It can be shown that this produces n different unbiased estimators of sigma^2 under homoskedasticity.
Read 12 tweets
Sep 28, 2021
On my shared Dropbox folder, pinned at the top, I posted the latest version of my TWFE/TWMundlak paper. It's essentially complete (and too long ...). I've included the "truly marvelous" proof of equivalence between pooled OLS and imputation.

#metricstotheface
I also fixed some of the material on testing/correcting for heterogeneous trends. A nice result is that the POLS approach with cohort-specific trends is the same as the obvious imputation approach.
This means that using the full regression to correct for non-paralled trends suffers no contamination when testing. It's identical to using only untreated obs to test for pre-trends. But one must allow full heterogen in cohort/time ATTs for the equiv to hold.
Read 8 tweets
Sep 28, 2021
Fortunately, the speculations I made in my linear DiD paper about extension to the nonlinear case turn out to be true -- with a small caveat. One should use the canonical link function for chosen quasi-log-likelihood (QLL) function.

#metricstotheface
So, exponential mean/Poisson QLL if y >= 0.
Logistic mean/Bernoulli QLL if 0 <= y <= 1 (binary or fractional). (We call this logit and fractional logit.)
Linear mean, normal (OLS, of course).

These choices ensure that pooled estimation and imputation are numerically identical.
It's not a coincidence that these same combos show up in my work on doubly robust estimation of treatment effects and improving efficiency without sacrificing consistency in RCTs. Latest on the latter is here:

scholar.google.com/citations?view…
Read 4 tweets
Aug 17, 2021
I finally got my TWFE/Mundlak/DID paper in good enough shape to make it an official working paper. I'll put it in other places but it's currently here:

researchgate.net/publication/35…

Also, the Stata stuff is still with the Dropbox link:

dropbox.com/sh/zj91darudf2…

#metricstotheface
I changed the title a bit to better reflect it's contents. I'm really happy with the results, less happy that the paper got a bit unwieldy. It's intended to be a "low hanging fruit" DID paper.
Now I've more formally shown that the estimator I was proposing -- either pooled OLS or TWFE or RE (they're all the same, properly done) identifies every dynamic treatment one is interested in (on means) in a staggered design.
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(