for some reason, it's very common in #psychology to remove 'outliers' from data
most common way: exclude data more than two standard deviations from mean
we spend time & money collecting data, then throw 5% away
🤷
I don't know why, or where it's taught, but there it is
3/15
some arguments in favour are that it:
- 'cleans' the data 🧹
- 'removes noise' 🔊
- 'improves signal to noise' 📶
and these all sound like *good* things to do, right?
4/15
well, not necessarily (see rest of thread)
some justifications for doing it might be:
- everyone does it 👨👩👧👦
- SPSS showed me outliers, so I HAD TO ACT 🚔
- I was taught to do it 🧑🎓
- [...insert your justification here...]
5/15
in my first real paper (doi.org/10.1016/j.neul…), I was told by a reviewer (THANK YOU!), that:
'what you're doing is 'fishy'. fixed cut-off outlier elimination biases the data; you should use a principled method, eg, of van Selst & Jolicoeur (1994)'
since that fateful day, I've been repeating that whenever I can to whichever outlier-excluding author is forced to listen 📢🙉
now, to reassure myself of the truth of this outlier-biasing-data factoid, I've run my own simulations 🧑💻
they are enlightening! ⛅️
7/15
long-story short: with a >2SD cut-off, the remaining sample, relative to the *true* population:
- has a lower standard deviation
- has a more 'wandering' mean
& this:
- *increases false-positive differences* between the new sample mean & eg, the true population mean
oops
8/15
this shows simulations of data, M=0, SD=1
half the data has a difference d, of 0 to 1 added (a 'true effect')
y=how often a p<.05 'result' is found
x=outlier removal criterion
lines=Ns
if outlier removal has no effect, all lines are flat
when d=0, ~10% false-positives w >2SD
too abstract? take 'IQ', which has a nice symmetrical mean=100 & SD=15
- sample 20 people from the population
- remove 'outliers' >2SD from sample mean
- the final mean will differ significantly (lower or higher, p<.05) from true population about 10% of the time
😱
9/15
that's right, by removing 'outliers' >2SD, you've ~DOUBLED the false-positive probability of your sample differing from the true population mean.
doubled. x2
and, you can throw away all subsequent analysis, because the data are now biased.
not good. not 'cleaner' 🧹
10/15
THAT'S JUST FOR SYMMETRICAL DISTRIBUTIONS!
if your distribution is asymmetrical (eg reaction times, percentage correct), outlier removal *also changes the means*
eg, human RTs are positively-skewed, approximately log-normal, so log(RT) gives approx normal distribution...
11/15
(in these graphs, the red lines are the means and the blue the medians, showing the skews - positive skew when red is bigger than blue, negative skew when red is smaller than blue)
percentage correct is likely negatively-skewed with ceiling effect at 100%, so you could do, eg, logistic transform on proportions = log(p/(1-p))
if you don't 'normalise' asymmetrical distributions, then removing outliers using fixed cut-offs can push the means up or down
12/15
in quite unpredictable ways!
here's the same graph as before - how many simulations show p<.05 - this time for positively-skewed (simulated) RT data. it's wild!
remember: flat lines are good lines...
WHAT ABOUT *REAL DIFFERENCES* BETWEEN CONDITIONS?
here's where it gets FUN!
assuming a real difference between first & second half of data (eg 2 conditions or groups) then:
removing outliers *decreases the probability of detecting that difference*
not good. *noisier*🔊
13/15
that's right - if there is a real difference in your data, and you remove outliers by pooling across the two conditions or groups, then it doesn't 'clean' your data at all, it makes it 'dirtier'!
after 24 hours, many comments & a long walk, some appendices:
1. these data only apply to between-participants considerations. an extra/different layer would be to do/not do outlier removal first on individual P's data, then look at effects on the group...
2. When I talked about 'false positives', it was about how your sample mean may or may not reflect the (true) population mean; eg, for IQ, after removing >2SD outliers, you would conclude ~10% of the time, that the population mean is NOT 100 (when it really is). it should be 5%!
3. the graphs I showed were only for the 'raw' data and the 'summary' data - I missed out perhaps the most important, intermediate, graphs 🤦- showing what each individual distribution looks like.
So, the next 3 graphs show some data BEFORE & AFTER removing >2SD outliers...
3A. Normal distribution, M=0, SD=1, N=20 per sample, 10K samples
from top: Means, SDs, t-scores, p-values
red line=expected (mean) value
see how the p-values become more likely 'significant' (at both lower and upper tails) 👀
Valentine Delrue @valentinedelrue & Sean Devine from @jtrialerror talking about their journal & project to make #science less fragmented, more open, honest & transparent. 22mins; links below...
Criticism: A non-systematic review of selected papers, no assessment of study quality, bias, or effect sizes. The only novel thing in this paper was a long, harsh critique of Iriki et al. (1996)
Criticism: I contributed only one figure (#8) to this paper, but >30 neonatal rats paid the ultimate price for my crappy patch-clamping skills (only 17 cells) as an MSc student. I still feel the guilt.