**
This Thread may be Removed Anytime!**

Twitter may remove this content at anytime! Save it as PDF for later use!

- Follow @ThreadReaderApp to mention us!
- From a Twitter thread mention us with a keyword "unroll"

`@threadreaderapp unroll`

Practice here first or read more on our help page!

If you haven't tried @dailyzad's concurve package to build p-value functions in R, make sure you give it a try - it is very nice. Details about the package can be found here:

data.lesslikely.com/concurve//inde…. This thread illustrates some of the package functionality. #rstats (1)

data.lesslikely.com/concurve//inde…. This thread illustrates some of the package functionality. #rstats (1)

First, install the latest version of the concurve package from Github:

install.packages("remotes")

remotes::install_github("zadrafi/concurve@master", dependencies = TRUE)

library(concurve)

The package home on Github is: github.com/zadrafi/concur… .

(2)

install.packages("remotes")

remotes::install_github("zadrafi/concurve@master", dependencies = TRUE)

library(concurve)

The package home on Github is: github.com/zadrafi/concur… .

(2)

Next, let's say you would like to construct a p-value function (aka confidence interval function) in the context of a simple linear regression model:

data(cars)

model <- lm(dist ~ speed, data=cars)

summary(model)

(3)

data(cars)

model <- lm(dist ~ speed, data=cars)

summary(model)

(3)

Just came upon this interpretation in "Regression and Other Stories" by Gelman, Hill and Vehtari:

"We can say that, under the fitted model, the average difference in earnings, comparing two people of the same sex but one inch different in height, is $600."

#rstats

"We can say that, under the fitted model, the average difference in earnings, comparing two people of the same sex but one inch different in height, is $600."

#rstats

I struggle with the interpretation provided because I don't understand why the authors talk about an "average difference" when comparing two individuals. I would understand talking about it when comparing two groups of individuals.

A linear regression model can be used to: 1. Predict true response values for individuals or 2. Estimate mean response values for groups of individuals (conditional on predictor values). The Gelman et al. interpretation seems to me to muddy the waters between these two purposes?

@tangming2005 Since nobody has mentioned the p-value function yet in this thread, I will jump into the fray to say that one way to convey more completely what is going on is by constructing the entire p-value function rather than reporting a single p-value. /1

@tangming2005 When plotted, the p-value function (also known as the confidence interval function) looks like a tepee. As explained at ebrary.net/72024/health/v…, it is closely related to the set of all confidence intervals for a given parameter. /2

@tangming2005 In fact, the p-value function is obtained by stacking all possible confidence intervals for the parameter of interest (e.g., 0%, 1%, ..., 99%, 100%) on top of each other, so that they all share the same center, which is the estimated value of that parameter. /3

An interesting question on Cross Validated got me thinking about the fact the heteroscedasticity is not a concept that is applicable to a count regression model. See below for my musings. The question is here: stats.stackexchange.com/questions/4890…. #rstats

Say we are fitting a Poisson regression model to our data:

m <- glm(y ~ x, family=poisson(link="log"), data = data)

and plot the residuals versus the fitted values. Should we worry about the obvious pattern of heteroscedasticity present in the plot?

m <- glm(y ~ x, family=poisson(link="log"), data = data)

and plot the residuals versus the fitted values. Should we worry about the obvious pattern of heteroscedasticity present in the plot?

Count regression models are predicated on the fact that the (conditional) mean of the response IS a function of the (conditional) variance of the response. What we need to do is consider whether the relationship between the mean and variance is adequately captured in our model.

When specifying mixed effects models in R, it helps to think of 'identifiers' versus 'characteristics' of random grouping factors.

These two end up being included in different parts of the model, so it pays off to distinguish between them. #rstats

1/n

These two end up being included in different parts of the model, so it pays off to distinguish between them. #rstats

1/n

Imagine we select a number of ocean sites at random where we wish to monitor fish abundance repeatedly each year.

Site is a random grouping factor.

SiteID (e.g., 1, 2, 3) is a site identifier.

SiteType (e.g., offshore, nearshore) is a site characteristic.

2/n

Site is a random grouping factor.

SiteID (e.g., 1, 2, 3) is a site identifier.

SiteType (e.g., offshore, nearshore) is a site characteristic.

2/n

So we would specify our model for Abundance (a count variable) something like this:

glmer(Abundance ~ Year + SiteType + (1|SiteID),

family=poisson(link="log"), nAGQ = 100)

Site characteristic is a predictor; site identifier identifies the random grouping factor. 3/n

glmer(Abundance ~ Year + SiteType + (1|SiteID),

family=poisson(link="log"), nAGQ = 100)

Site characteristic is a predictor; site identifier identifies the random grouping factor. 3/n

Did you know you can use the map() function from the purrr package in R to simultaneously apply the same "computational or graphical recipe" to each column of a dataset? In this tweetorial, I will explain how you can do this. Ready? Let's go! #rstats

Let's say you work with R's airquality dataset and wish to compute the number of missing data values to each of its quantitative variables. Here is what you need to use map:

Columns: Ozone Solar.R Wind Temp

Recipe: Function for computing the number of missing values.

Columns: Ozone Solar.R Wind Temp

Recipe: Function for computing the number of missing values.

The R package naniar has the recipe you need - it's a function called n_miss() which reports the missing data values present in a dataset column/variable.

You can use the select() function in the dplyr package to select the columns of interest from the dataset.

You can use the select() function in the dplyr package to select the columns of interest from the dataset.