I found a tweet on Poisson regression this morning on my feed and thought it would be a good time to write a tweetorial on Poisson regression. To keep things simple, I will assume we have a count response variable Y and a single predictor variable X. Here we go! #rstats
The response variable in Poisson regression is a count variable (e.g., the number of fish in a lake, measured every year). The marginal distribution of this variable can be visualized with an R command like:

plot(table(Data$Y), xlab = "Y", ylab = "Frequency")
The relationship between the response variable Y (e.g., yearly count of fish in a lake) and the predictor variable X (e.g., year of study, labelled as 0, 1, 2, etc.) can be plotted with the commands:


xyplot(Y ~ X, data = Data, cex = 1.5, pch=19)
Notice the apparent non-linear relationship between year of study X and number of fish in the lake Y apparent from the plot of Y versus X in our example. Poisson regression will allow us to capture this relationship.
A Poisson regression model is easy to formulate and fit in R:

Model <- glm(Y ~ X,
family = poisson(link = "log"),
data = Data)


What can be tricky though is interpreting the model summary.
To interpret the model summary of a Poisson regression model, we need to first understand what we are modelling:

We are modelling the log-transformed (true) 'mean' value of Y as a linear function of X.

Other people may use the words 'expected' or average' instead of 'mean'.
So how do we interpret the estimated intercept in our Poisson regression model? It is the estimated log-transformed 'mean' value of Y when X = 0. In other words, the log 'mean' count of fish in the lake in the first year of study is 0.25964.
The intercept becomes more interpretable if we exponentiate its value to get rid of the pesky log-transformation:

exp(0.25964) = 1.296463 is the 'mean' count of fish in the lake in the first year of study (which can be rounded for reporting purposes).
What about interpreting the estimated slope of X in our Poisson regression model?

Each additional year (i.e., each 1-unit increase in X) is associated with an (additive) increase of 0.30677 units in the log-transformed 'mean' count of fish in the lake.
The log-transformation strikes again! The estimated slope of X becomes more interpretable after exponentiation:

exp(0.30677) = 1.36.

Each additional year is associated with a multiplicative increase by a factor of 1.36 in the 'mean' count of fish in the lake (a 36% increase).
The effects package of R makes it easy to visualize the (additive) effect of X on the log-transformed 'mean' value of Y given X or the (multiplicative) effect of X on the 'mean' value of Y given X.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Isabella R. Ghement

Isabella R. Ghement Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @IsabellaGhement

18 Oct 20
If you haven't tried @dailyzad's concurve package to build p-value functions in R, make sure you give it a try - it is very nice. Details about the package can be found here:
data.lesslikely.com/concurve//inde…. This thread illustrates some of the package functionality. #rstats (1)
First, install the latest version of the concurve package from Github:


remotes::install_github("zadrafi/concurve@master", dependencies = TRUE)


The package home on Github is: github.com/zadrafi/concur… .

Next, let's say you would like to construct a p-value function (aka confidence interval function) in the context of a simple linear regression model:

model <- lm(dist ~ speed, data=cars)

Read 17 tweets
17 Oct 20
Just came upon this interpretation in "Regression and Other Stories" by Gelman, Hill and Vehtari:

"We can say that, under the fitted model, the average difference in earnings, comparing two people of the same sex but one inch different in height, is $600."

I struggle with the interpretation provided because I don't understand why the authors talk about an "average difference" when comparing two individuals. I would understand talking about it when comparing two groups of individuals.
A linear regression model can be used to: 1. Predict true response values for individuals or 2. Estimate mean response values for groups of individuals (conditional on predictor values). The Gelman et al. interpretation seems to me to muddy the waters between these two purposes?
Read 7 tweets
28 Sep 20
@tangming2005 Since nobody has mentioned the p-value function yet in this thread, I will jump into the fray to say that one way to convey more completely what is going on is by constructing the entire p-value function rather than reporting a single p-value. /1
@tangming2005 When plotted, the p-value function (also known as the confidence interval function) looks like a tepee. As explained at ebrary.net/72024/health/v…, it is closely related to the set of all confidence intervals for a given parameter. /2
@tangming2005 In fact, the p-value function is obtained by stacking all possible confidence intervals for the parameter of interest (e.g., 0%, 1%, ..., 99%, 100%) on top of each other, so that they all share the same center, which is the estimated value of that parameter. /3
Read 14 tweets
25 Sep 20
An interesting question on Cross Validated got me thinking about the fact the heteroscedasticity is not a concept that is applicable to a count regression model. See below for my musings. The question is here: stats.stackexchange.com/questions/4890…. #rstats
Say we are fitting a Poisson regression model to our data:

m <- glm(y ~ x, family=poisson(link="log"), data = data)

and plot the residuals versus the fitted values. Should we worry about the obvious pattern of heteroscedasticity present in the plot? Image
Count regression models are predicated on the fact that the (conditional) mean of the response IS a function of the (conditional) variance of the response. What we need to do is consider whether the relationship between the mean and variance is adequately captured in our model.
Read 10 tweets
15 Sep 20
When specifying mixed effects models in R, it helps to think of 'identifiers' versus 'characteristics' of random grouping factors.

These two end up being included in different parts of the model, so it pays off to distinguish between them. #rstats

Imagine we select a number of ocean sites at random where we wish to monitor fish abundance repeatedly each year.

Site is a random grouping factor.

SiteID (e.g., 1, 2, 3) is a site identifier.

SiteType (e.g., offshore, nearshore) is a site characteristic.

So we would specify our model for Abundance (a count variable) something like this:

glmer(Abundance ~ Year + SiteType + (1|SiteID),
family=poisson(link="log"), nAGQ = 100)

Site characteristic is a predictor; site identifier identifies the random grouping factor. 3/n
Read 5 tweets
17 Aug 20
Did you know you can use the map() function from the purrr package in R to simultaneously apply the same "computational or graphical recipe" to each column of a dataset? In this tweetorial, I will explain how you can do this. Ready? Let's go! #rstats
Let's say you work with R's airquality dataset and wish to compute the number of missing data values to each of its quantitative variables. Here is what you need to use map:

Columns: Ozone Solar.R Wind Temp
Recipe: Function for computing the number of missing values.
The R package naniar has the recipe you need - it's a function called n_miss() which reports the missing data values present in a dataset column/variable.

You can use the select() function in the dplyr package to select the columns of interest from the dataset.
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!