Tweet

John Mullahy

23 Feb, 12 tweets, 4 min read

In light of yesterday's massive thread on Poisson regression I thought it perhaps appropriate to revisit an issue that arises sometimes with Poisson estimation in Stata.

This will be familiar to some of you but perhaps not to others.

The typical case is where there are ≥1 dummy RHS variables that are almost always 0 (or almost always 1).

The Poisson estimator requires solving the vector of equations x'(y-exp(x*b))=0. This solution requires in turn that none of the dummy x's can equal 1 *only* when y=0. Else x'y=0 and the algorithm is trying to find a value of b that makes exp(x*b)=0 which can't happen.

The problem is that Stata's –poisson– and –glm– algorithms will, at least sometimes, not detect this data structure and will cosmetically appear to have converged to a solution.

Typically the magnitude of one or more parameter estimates will be huge and this is the tip-off.

Consider this dataset where x2'y=0.

Here are the results from poisson regression. Note the estimated parameter for x2.

This isn't just a small-sample artifact...

The glm procedure has the same problem...

Note that a linear model does not encounter this problem because its conditional mean is not restricted to be positive.

The solution is easy and obvious: Before undertaking Poisson estimation check your estimation sample to be sure it doesn't have these features. (Note: The same dummy-variable "spanning" problems arise with binary outcomes and probit, logit, etc.)

@threadreaderapp

@threadreaderapp unroll

@AustnNchols

Postscript: Joao Santos Silva (who is not on Twitter to my knowledge) posted this earlier today on StataList, with a link to his nice paper (also linked yesterday in this thread by @AustnNchols) and a strong endorsement of –ppmlhdfe–
statalist.org/forums/forum/g…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @JohnMullahy

John Mullahy

@JohnMullahy

22 Feb

https://twitter.com/jmwooldridge/status/1363828456136523779

I propose naming this approach the Jeffit estimator.

https://twitter.com/jmwooldridge/status/1363828456136523779

"We used Jeffit to estimate the average partial effects and their .95 CIs."

"We compare our main results with those obtained using Jeffit."

Read 7 tweets

John Mullahy

@JohnMullahy

27 May 20

@Stata

If you use @Stata to compute/estimate quantiles/percentiles there's a Statalist thread that may be of interest. (Spoiler: Different commands can yield different results—except for the median—so exercise care with tail-probability, IQR, etc. calculations.)
statalist.org/forums/forum/g…

This is probably a negligible concern when analyzing most "large" samples, but not necessarily so for "small" ones.

Here's an example—

Read 5 tweets

John Mullahy

@JohnMullahy

11 May 20

@Stata

Earlier threads have considered the use of the –recast– option in @Stata graphics. Here's another.

The –twoway function– command in Stata permits nice visualizations of explicit functions y=f(x) over some continuous domain of x-values. E.g.

twoway function y=normal(x), range(-3 3)

This can be helpful in...

— visualizing comparative features of different explicit functions

— visualizing theoretical vs. empirical results (e.g. goodness-of-fit)

— etc.

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

John Mullahy

Try unrolling a thread yourself!

More from @JohnMullahy

John Mullahy

John Mullahy

John Mullahy

Did Thread Reader help you today?

Like this author's thread?