Statistical things to worry about *less* 1) significance of univariable associations 2) significant model goodness-of-fit tests 3) imbalance in randomized trials 4) non-normality of observations 5) multicollinearity
1) As a precursor to multivariable analyses, the associations between each individual covariate and outcome are often "screened" for significance. Often does more harm than good, so don't bother doing or worry about it onlinelibrary.wiley.com/doi/full/10.11…
2) Every model is a simplification of reality: perfect model fit isn't a real thing. The question is not *if* but *how much* imperfection.
E.g. by evaluation of a hierarchy of calibration, one doesn't have to worry about significant Hosmer-Lemeshow tests sciencedirect.com/science/articl…
3) Randomization might be the closest thing to magic. Thanks to randomization (and inferential statistics), we don't have to worry about random imbalances in baseline characteristics. Hooray! statsepi.substack.com/p/out-of-balan…
4) Normality is a rare shape for data. Fortunately, we don't need data to be shaped liked that for most of our analyses. No worries about skewed distributions, if anything, modeling assumptions are usually about shape of residuals rather than observations psychometroscar.com/2018/07/11/nor…
5) Multicollinearity might be the most overrated problem in statistics. While it can cause imprecision in regression coefficients of co-linear variables (shows clearly in results), remaining coefficients and model predictions are typically unaffected: arxiv.org/abs/2101.01603
Now that we have 5 statistical things to worry about less, maybe we can worry a bit more about formulating researchable questions, what we measured, what and who we didn't measure (missing data), sample sizes, unnecessary dichotomizations, overfitting, etcetera
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Multicolinearity— they all look the same
Heteroscedasticity— the variation varies
Attenuation— being too modest
Overfitting— too good to be true
Confounding— nothing is what it seems
P-value— it’s complicated
Sensitivity analysis— tried a bunch of stuff
Post-hoc— main analysis not sexy enough
Multivariate— oops, meant to say multivariable
Normality— a very rare shape for data
Dichotomized— data was tortured
Extrapolation— just guessing
Linear regression— line through data points
t-test— linear regression
correlation— linear regression
ANOVA— linear regression
ANCOVA— linear regression
Chi-square test— logistic regression
Deep learning— bunch of regressions
Personal top 10 fallacies and paradoxes in statistics 1. Absence of evidence fallacy 2. Ecological fallacy 3. Stein’s paradox 4. Lord’s paradox 5. Simpson’s paradox 6. Berkson’s paradox 7. Prosecutors fallacy 8. Gambler’s fallacy 9. Lindsey’s paradox 10. Low birthweight paradox
1. Absence of evidence fallacy
Absence of evidence is not the same as evidence of absence. Wouldn't it be great if not statistically significant would just mean "no effect"? bmj.com/content/311/70…
2. Ecological fallacy
Hard to resist those sweet population level data to make inferences about health effects on the individual level web.stanford.edu/class/ed260/fr…
How do I know how to become a successful academic? I don't, but I have received plenty of advice. As a good academic, I will just summarize what I have learned from listening
1) Be the ultimate collaborator but also don't be
Say yes to as many collaborations as physically possible: co-produce papers, LEARN, co-write grants, DISCUSS, it is all about synergy. But also, collaborations slow you down, have your own ideas! Just say no to collaborations
Disclaimer: this top 10 is just personal opinion. I’m biased towards explanatory methods and statistics articles relevant to health research, particularly those relating to prediction
The order in which the articles appear is pseudo-random
1) The first one is related to the pandemic. Title and subtitle give away the conclusions, but the arguments are particularly well put
First, I send you emails to which you politely and quickly responded. Thanks. You seemed to agree with my critique, but you didn't show any initiative to change it or remove the model
@Laconic_doc@statsmethods@GSCollins Second, I am one of the authors of a reply to the OpenSAFELY study where we specifically mention their model falls short of developing a risk model. You seem to have ignored that and used their multivariable results anyway