, 12 tweets, 5 min read
👋 @LucyStats here! Today we’re going to do a little stats primer on testing for non-linear terms when fitting a model.
What do you do when trying to decide whether to include a non-linear term in a model?

1️⃣ test the nonlinear term, if significant leave it in
2️⃣ if you have enough dfs, include the nonlinear term regardless of significance
3️⃣ never include nonlinear terms
4️⃣ comment
It turns out if you make a decision to include the nonlinear term based on a significance test, you are at risk of inflating your Type 1 error 😱

📃 source: onlinelibrary.wiley.com/doi/abs/10.100…
🗣 aside, a quick reminder of what Type 1 error is:

In the boy who cried wolf 🙆‍♂️🗣🐺, the villagers ☝️ first committed a Type 1 error (thinking the wolf was there when it was not!) then ✌️ they committed a Type 2 error (thinking the wolf was not there when it was!)
SO these Type 1 errors are bad! Essentially, by deciding whether to include nonlinear terms based on a significance test, you are increasing the chance that you’ll incorrectly reject your null hypothesis!
Instead of just reading about it, let’s have an 🎶 #rstats code-along with @LucyStats 🎶. For this code-along, you’ll just need one package, “rms”. If you don’t have it already you can install it by running
install.packages(“rms”) in R
@LucyStats First we are going to create a function called `sim_1()` this function will simulate the situation described in 1️⃣ in the poll. We fit a linear model with a non-linear term (a restricted cubic spline on x). If the nonlinear term is significant, we leave it, if not we remove it.
@LucyStats library(rms)
sim_1 <- function(){
y <- rnorm(30)
x <- rnorm(30)
mod <- ols(y ~ rcs(x))

if (anova(mod)[[" Nonlinear", "P"]] > 0.05){
# if non-linearity is not "significant", remove terms
mod <- ols(y ~ x)
}
anova(mod)[["x", "P"]]
}
library(rms)<br />
sim_1 <- function(){<br />
  y <- rnorm(30)<br />
  x <- rnorm(30)<br />
  mod <- ols(y ~ rcs(x))<br />
  <br />
  if (anova(mod)[[ 0.05){
# if non-linearity is not "significant", remove terms
mod <- ols(y ~ x)
}
anova(mod)[["x", "P"]]
}" src="/images/1px.png" data-src="https://pbs.twimg.com/media/EGYrw1lUUAA9gpU.jpg">
@LucyStats Using the `replicate()` function, we can run that a bunch of times and calculate the Type 1 error.

test <- replicate(10000, sim_1())
mean(test <= 0.05)

When I run this, I get 0.0832 - uh oh! that’s definitely above our 0.05 target 🎯

test <- replicate(10000, sim_1())<br />
mean(test <= 0.05)<br />
# [1] 0.0832
You may get something slightly different, but it should be approximately the same, something likely above 0.05. Now let’s run scenario 2️⃣ - we fit the model with non-linear terms from the beginning. I’ll call this sim_2()
sim_2 <- function(){
y <- rnorm(30)
x <- rnorm(30)
mod <- ols(y ~ rcs(x))
anova(mod)[["x", "P"]]
}

test <- replicate(10000, sim_2())
mean(test <= 0.05)

Now I get 0.0522 - much better!

sim_2 <- function(){<br />
  y <- rnorm(30)<br />
  x <- rnorm(30)<br />
  mod <- ols(y ~ rcs(x))<br />
  anova(mod)[[ }

test <- replicate(10000, sim_2())
mean(test <= 0.05)
# [1] 0.0522" src="/images/1px.png" data-src="https://pbs.twimg.com/media/EGYtYLbUcAA0B_9.png">
So what’s the moral here? If you can spare the degrees of freedom, fit flexible models from the get go! Skip the tests for nonlinearity 🎉 Check out @f2harrell’s Regression Modeling Strategies for more neat tips on modeling! biostat.mc.vanderbilt.edu/wiki/Main/RmS
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Am J Epidemiology
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!