Carlisle Rainey Profile picture
Aug 1, 2023 30 tweets 10 min read Read on X
Do you use logistic regression? If so, you’ll want to read the thread below.

⚠️ Warning: Memes, charts, #rstats, and practical advice ahead. Image
If you don’t like Twitter shenanigans, I’ll give it away.

For logit models with small-to-moderate samples (maybe N < 1,000), you should consider Firth’s penalized estimator.

I talk about it in this paper with Kelly McCaskey (open access!).

cambridge.org/core/journals/…
Now on with the thread! Image
Logit models are really common in the social sciences. We typically use maximum likelihood (ML) to estimate these models. But the excellent properties of these models are mostly asymptotic. Image
However, these estimates might not be well-behaved in small samples. In particular, some folks are concerned about small sample bias in logit models. And that’s a real thing.

(But I don’t think it’s the most important problem—keep reading.) Image
The figure below shows the percent bias in the coefficient estimates for different constants and numbers of explanatory variables (k) as the sample size varies. It’s hardly negligible, but it disappears quickly. Image
Fortunately, David Firth came along and suggested a *penalized* maximum likelihood estimator that eliminates almost all of this bias.

jstor.org/stable/2336755
Image
If this seems familiar, it should. Zorn’s (@prisonrodeo) (2005) paper is a classic in political science methods classes, and he recommends Firth’s penalty to deal with separation.

cambridge.org/core/journals/…
Image
Here’s what Firth’s penalty looks like. You just maximize the penalized likelihood L* rather than the usual likelihood L. Image
And it really works! Here’s a comparison of the percent bias in the ML and PML estimators. You’ll see that Firth’s penalty just wipes most of the bias away. Image
BUT WAIT!!!! 🛑

If you’re clever, you’ll ask about variance. Most of the time, when you reduce bias, you increase variance. You have to choose! Image
But that’s not what happens here.

When you use Firth’s logit, you shrink *both* bias and variance. Image
That means you don’t have to choose between bias and variance. You can reduce BOTH. Image
Here’s a figure showing how much more variable your estimates will be if you use ML rather than Firth’s PML. Image
But even more importantly, it turns out that bias isn’t the big problem in the first place. The shrinkage in the variance is much more important than the reduction in bias. Image
In many common scenarios, the variance might contribute about 25 times more to the MSE than the bias (or higher). Image
So you shouldn’t really be using PML to reduce bias; you should be using PML to reduce *variance* (and bias). Image
All of this means that you should usually use *penalized* maximum likelihood to fit logistic regression models.

As a default, Firth’s penalty makes much more sense than the usual maximum likelihood estimator. Image
In practice, that means using the {brglm2} package rather than glm().

And Twitter will love this! {brglm2} works with @VincentAB’s {marginaleffects} package and @noah_greifer’s {clarify} package. Image
And it can make a big difference! Here’s a comparison for a small data set from Weisiger (2014).

Paper here:

Code here: journals.sagepub.com/doi/pdf/10.117…
gist.github.com/carlislerainey…
Here's the plot Image
In short, I think Firth’s PML is usually preferable to ML for fitting logit models. It’s always better in theory (smaller bias and variance), easy to implement (brglm2), makes BIG difference in small samples, and a meaningful difference in much larger samples (e.g., N = 1,000). Image
If you’re interested in this topic, then I recommend the work of Ioannis Kosmidis (@IKosmidis_).



ikosmidis.com
twitter.com/IKosmidis_
And here’s a nugget for #econtwitter. For a simple treatment/control design with a binary outcome, Firth’s logit produces a better estimate of the ATE than OLS.
I’ve got lots more thoughts on this that I might put in a blog post, but for now, here are two takeaways.

<1> This “small sample” problem is a problem for even larger samples (perhaps larger than 1,000).

<2> The real problem isn’t bias; the problem is variance.
If you’re interested, here’s the paper (with Kelly McCaskey) that describes all the details. It’s open access.

cambridge.org/core/journals/…
Image
And a little tidbit that popped up elsewhere.

Super interesting.
And here’s a really clever application of Firth’s method to panel data in political science from @CookScottJ.

Ungated:

Journal (@PSRMJournal): sites.pitt.edu/~jch61//PS2740…
cambridge.org/core/journals/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Carlisle Rainey

Carlisle Rainey Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @carlislerainey

Apr 17
Here's a few recent papers you might find helpful or fun if you're interested in replication in social science.

These papers help us answer the question: Given finite resources, what papers should we replicate?

A thread 🧵 Image
Paper #1

"Making replication mainstream"

DOI:

This is a nice paper with commentaries (36!!) and a response. I really enjoy these and this is a good one!

The Ioannidis comment is especially 🔥.

See also Kochari and Ostarek, and others. doi.org/10.1017/S01405…


Image
Image
Image
Paper #2

"Deciding what to replicate: A decision model for replication study selection under resource and knowledge constraints"

DOI:

This is a super-clear, helpful paper. psycnet.apa.org/doi/10.1037/me…



Image
Image
Image
Image
Read 14 tweets
Feb 16
🆕 "The Data Availability Policies of Political Science Journals" is on @socarxiv

1️⃣ 20% of political science journals require sharing data.

2️⃣We should remain mindful of the effectiveness and rarity of requiring data sharing.

👇links + discussion below Image
You can find the preprint on @socarxiv; the paper is currently under review.

CC: @RoeHarley

osf.io/preprints/soca…
The project emerges out of a long-term interest I have in the purpose of reproduction archives, how we can maximize their value, and how we can minimize waste.
Read 14 tweets
Sep 1, 2023
🚨 New Paper (Open Access!) 🚨

Twitter title: "{marginaleffects} Does This Thing Well (And You've Never Fully Appreciated It)"

doi.org/10.1017/psrm.2…
Image
I've taught MLE for a long time. Since the start, I struggled to connect the carefully constructed theory of MLE with the "average-of-simulations" point estimate of King, Tomz, and Wittenberg.

I had to tell students: "this kinda works." 🤷‍♂️

This paper makes the connection.
Aside: it's hard to describe how much King, Tomz, and Wittenberg improved statistical practice. IMO, political science is miles ahead of other fields here. Maybe it's {CLARIFY} that finally got people computing easily interpretable quantities? We owe this project a lot. Image
Read 37 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(