, 13 tweets, 5 min read Read on Twitter
The assumption of normality in many statistical models refers to the model residuals and not the dependent variable. @oscar_olvera100 has a great post about this topic (linked). I tried to expand on his thoughts through some additional simulation/visualization. #epitwitter 1/n
In these examples, we’ll look at a simple linear model Y ~ X, where X is dichotomous and Y is continuous. This first plot shows, for two relative effect sizes, the distributions of:

1. Y stratified by X
2. Y without stratification
3. Residuals from the linear model Y ~ X

2/n
We can "test" the normality assumption using the Shapiro-Wilk test. We find that if we look at only the dependent variable, the normality assumption is "violated" (SW p < 0.05) when the effect size is large (Cohen's d = 2), but not for smaller effect sizes (Cohen's d = 0.75). 3/n
Of course, this approach isn't technically correct, and we should be looking at the distribution of residuals. We find that regardless of effect size, the normality assumption holds when we test the model's residuals (non-significant SW test). 4/n
These two examples beg the question (as Oscar points out in his blog), what is the relationship between effect size, and relative "normality" of the dependent variable vs. model residuals. 5/n
Here I simulated various effect sizes (1K times each) and plotted the SW p-values for both the DV (blue) and the residuals (red). We see that regardless of effect size, the SW p-values for the residuals have a uniform distribution (as expected under the null) with 5% < 0.05. 6/n
However, the SW p-values for the DV trend toward "significance" (dashed line is p = 0.05) as the effect size increases. This is accelerated by a larger sample size. With n = 5000, the SW test is likely to be significant for the DV once the effect size reaches Cohen's d ~ 1.5. 7/n
We can look at these data in another way. Here I binned the Cohen's d values and looked at the proportion of SW p-values < 0.05 for each bin. As expected, for the residuals there is no trend, and regardless of effect size ~5% of SW p-values will be significant under the null. 8/n
For the DV we see the same pattern of larger effect size = more likely significant SW test, with an acceleration of the trend as sample size increases. 9/n
With all of this being said, a Cohen's d of even 0.8 is considered to be a large effect size. So, as Oscar points out, depending on the field you work in, testing the DV may still get you the "right" answer, despite technically being the wrong way to get there. 10/n
There are also other (perhaps more appropriate?) ways to examine the normality assumption which I haven't covered here (Q-Q plots, for example) which additionally don't rely on significance testing. One criticism of SW is its sensitivity to minor deviations with large N. 11/n
Thanks again to @oscar_olvera100 for the great blog post (bit.ly/2JdfXkx) and inspiration! Code for my simulations and figures can be found on GitHub (bit.ly/2JdgYcl). 12/12
@oscar_olvera100 Some great feedback on proper terminology in this context (model residuals vs. model errors) from @IsabellaGhement!

Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Ben Andrew
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!