12,399 views

Am J Epidemiology

@AmJEpi

Profile

, 12 tweets, 5 min read

@LucyStats

@LucyStats

👋 @LucyStats here! Today we’re going to do a little stats primer on testing for non-linear terms when fitting a model.

What do you do when trying to decide whether to include a non-linear term in a model?

1️⃣ test the nonlinear term, if significant leave it in
2️⃣ if you have enough dfs, include the nonlinear term regardless of significance
3️⃣ never include nonlinear terms
4️⃣ comment

It turns out if you make a decision to include the nonlinear term based on a significance test, you are at risk of inflating your Type 1 error 😱

📃 source: onlinelibrary.wiley.com/doi/abs/10.100…

🗣 aside, a quick reminder of what Type 1 error is:

In the boy who cried wolf 🙆‍♂️🗣🐺, the villagers ☝️ first committed a Type 1 error (thinking the wolf was there when it was not!) then ✌️ they committed a Type 2 error (thinking the wolf was not there when it was!)

SO these Type 1 errors are bad! Essentially, by deciding whether to include nonlinear terms based on a significance test, you are increasing the chance that you’ll incorrectly reject your null hypothesis!

@LucyStats

@LucyStats

Instead of just reading about it, let’s have an 🎶 #rstats code-along with @LucyStats 🎶. For this code-along, you’ll just need one package, “rms”. If you don’t have it already you can install it by running
install.packages(“rms”) in R

@LucyStats

@LucyStats

@LucyStats First we are going to create a function called `sim_1()` this function will simulate the situation described in 1️⃣ in the poll. We fit a linear model with a non-linear term (a restricted cubic spline on x). If the nonlinear term is significant, we leave it, if not we remove it.

@LucyStats

@LucyStats

@LucyStats library(rms)
sim_1 <- function(){
y <- rnorm(30)
x <- rnorm(30)
mod <- ols(y ~ rcs(x))

if (anova(mod)[[" Nonlinear", "P"]] > 0.05){
# if non-linearity is not "significant", remove terms
mod <- ols(y ~ x)
}
anova(mod)[["x", "P"]]
}
$library(rms) sim_1 <- function(){ y <- rnorm(30) x <- rnorm(30) mod <- ols(y ~ rcs(x)) if (anova(mod)[[$ 0.05){
# if non-linearity is not "significant", remove terms
mod <- ols(y ~ x)
}
anova(mod)[["x", "P"]]
}" src="/images/1px.png" data-src="https://pbs.twimg.com/media/EGYrw1lUUAA9gpU.jpg">

@LucyStats

@LucyStats

@LucyStats Using the `replicate()` function, we can run that a bunch of times and calculate the Type 1 error.

test <- replicate(10000, sim_1())
mean(test <= 0.05)

When I run this, I get 0.0832 - uh oh! that’s definitely above our 0.05 target 🎯

You may get something slightly different, but it should be approximately the same, something likely above 0.05. Now let’s run scenario 2️⃣ - we fit the model with non-linear terms from the beginning. I’ll call this sim_2()

sim_2 <- function(){
y <- rnorm(30)
x <- rnorm(30)
mod <- ols(y ~ rcs(x))
anova(mod)[["x", "P"]]
}

test <- replicate(10000, sim_2())
mean(test <= 0.05)

Now I get 0.0522 - much better!

$sim_2 <- function(){ y <- rnorm(30) x <- rnorm(30) mod <- ols(y ~ rcs(x)) anova(mod)[[$ }

test <- replicate(10000, sim_2())
mean(test <= 0.05)
# [1] 0.0522" src="/images/1px.png" data-src="https://pbs.twimg.com/media/EGYtYLbUcAA0B_9.png">

@f2harrell

@f2harrell

So what’s the moral here? If you can spare the degrees of freedom, fit flexible models from the get go! Skip the tests for nonlinearity 🎉 Check out @f2harrell’s Regression Modeling Strategies for more neat tips on modeling! biostat.mc.vanderbilt.edu/wiki/Main/RmS

Like this thread? Get email updates or save it to PDF!

Subscribe to Am J Epidemiology

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Am J Epidemiology

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @AmJEpi see all

Related threads

Trending hashtags

Did Thread Reader help you today?