I've been revisiting several probability, statistics & clinical research concepts that I've learned through simulation & visualization over the past years. Full list pinned below - I hope this can be helpful for others who also learn this way! #epitwitter#statstwitter#RStats
The assumption of normality in many statistical models refers to the model residuals and not the dependent variable. @oscar_olvera100 has a great post about this topic (linked). I tried to expand on his thoughts through some additional simulation/visualization. #epitwitter 1/n
In these examples, we’ll look at a simple linear model Y ~ X, where X is dichotomous and Y is continuous. This first plot shows, for two relative effect sizes, the distributions of:
1. Y stratified by X 2. Y without stratification 3. Residuals from the linear model Y ~ X
2/n
We can "test" the normality assumption using the Shapiro-Wilk test. We find that if we look at only the dependent variable, the normality assumption is "violated" (SW p < 0.05) when the effect size is large (Cohen's d = 2), but not for smaller effect sizes (Cohen's d = 0.75). 3/n
I took a crack at reworking @statsepi’s RCT adjustment post (bit.ly/2Pd1JkH) w/ added simulation & visualization. To pique interest, here is a redacted version of the final image. Stick around for a walkthrough of how to get there & what it means. #epitwitter#rstats 1/n
Let's say we are studying the effect of a treatment on a continuous outcome (Y). In this field of study, 20 continuous patient variables are commonly reported (i.e., are found in the typical table 1). 2/n
Next, let's say that 10 of these variables are causally unrelated to Y (and thus are non-predictors of Y; call these N) and 10 have true causal effects on Y (and thus are predictors of Y; call these C). 3/n