Tweet

Dr Nick Holmes 💚🌶💉💉

4 Sep, 30 tweets, 10 min read

hello #psychology #stats twitter!

a while ago i promised some graphs on why #removing #outliers using a simple cut-off (eg, >2SDs) is a #bad idea

so that I can sleep at night again, here they are

tldr: DON'T (blindly) USE FIXED OUTLIER CUT-OFFS LIKE >2SD. EVER.

1/15

https://twitter.com/TheHandLab/status/1414550845639778311

some twitter background:
2/15

https://twitter.com/TheHandLab/status/1414550845639778311

https://twitter.com/TheHandLab/status/1294239405612437504

https://twitter.com/IsabellaGhement/status/1294285845701173249

for some reason, it's very common in #psychology to remove 'outliers' from data

most common way: exclude data more than two standard deviations from mean

we spend time & money collecting data, then throw 5% away

🤷

I don't know why, or where it's taught, but there it is

3/15

some arguments in favour are that it:

- 'cleans' the data 🧹
- 'removes noise' 🔊
- 'improves signal to noise' 📶

and these all sound like *good* things to do, right?

4/15

well, not necessarily (see rest of thread)

some justifications for doing it might be:

- everyone does it 👨‍👩‍👧‍👦
- SPSS showed me outliers, so I HAD TO ACT 🚔
- I was taught to do it 🧑‍🎓
- [...insert your justification here...]

5/15

in my first real paper (doi.org/10.1016/j.neul…), I was told by a reviewer (THANK YOU!), that:

'what you're doing is 'fishy'. fixed cut-off outlier elimination biases the data; you should use a principled method, eg, of van Selst & Jolicoeur (1994)'

6/15
doi.org/10.1080/146407…

since that fateful day, I've been repeating that whenever I can to whichever outlier-excluding author is forced to listen 📢🙉

now, to reassure myself of the truth of this outlier-biasing-data factoid, I've run my own simulations 🧑‍💻

they are enlightening! ⛅️

7/15

long-story short: with a >2SD cut-off, the remaining sample, relative to the *true* population:

- has a lower standard deviation
- has a more 'wandering' mean

& this:
- *increases false-positive differences* between the new sample mean & eg, the true population mean

oops

8/15

this shows simulations of data, M=0, SD=1

half the data has a difference d, of 0 to 1 added (a 'true effect')

y=how often a p<.05 'result' is found
x=outlier removal criterion
lines=Ns

if outlier removal has no effect, all lines are flat

when d=0, ~10% false-positives w >2SD

too abstract? take 'IQ', which has a nice symmetrical mean=100 & SD=15

- sample 20 people from the population
- remove 'outliers' >2SD from sample mean
- the final mean will differ significantly (lower or higher, p<.05) from true population about 10% of the time

😱

9/15

that's right, by removing 'outliers' >2SD, you've ~DOUBLED the false-positive probability of your sample differing from the true population mean.

doubled. x2

and, you can throw away all subsequent analysis, because the data are now biased.

not good. not 'cleaner' 🧹

10/15

THAT'S JUST FOR SYMMETRICAL DISTRIBUTIONS!

if your distribution is asymmetrical (eg reaction times, percentage correct), outlier removal *also changes the means*

eg, human RTs are positively-skewed, approximately log-normal, so log(RT) gives approx normal distribution...

11/15

(in these graphs, the red lines are the means and the blue the medians, showing the skews - positive skew when red is bigger than blue, negative skew when red is smaller than blue)

percentage correct is likely negatively-skewed with ceiling effect at 100%, so you could do, eg, logistic transform on proportions = log(p/(1-p))

if you don't 'normalise' asymmetrical distributions, then removing outliers using fixed cut-offs can push the means up or down

12/15

in quite unpredictable ways!

here's the same graph as before - how many simulations show p<.05 - this time for positively-skewed (simulated) RT data. it's wild!

remember: flat lines are good lines...

WHAT ABOUT *REAL DIFFERENCES* BETWEEN CONDITIONS?

here's where it gets FUN!

assuming a real difference between first & second half of data (eg 2 conditions or groups) then:

removing outliers *decreases the probability of detecting that difference*

not good. *noisier*🔊

13/15

that's right - if there is a real difference in your data, and you remove outliers by pooling across the two conditions or groups, then it doesn't 'clean' your data at all, it makes it 'dirtier'!

😱

#StatsShocker

14/15

Conclusions:

1. don't remove outliers using >2SD from mean. EVER

2. if you MUST* remove 'outliers', then >3SD seems much less biasing**, so use that?

* eg supervisor/reviewer/book/lecturer forces you

** so less that it's probably not worth doing at all. so maybe don't?

15/15

and don't just take my word for it, here's a true god of reaction times, Jeff Miller, who said, 30 years ago, that outlier removal is:

"very dangerous" 🐯

16/15

psycnet.apa.org/doi/10.1080/14…

there may be other good reasons to remove data:

- the experiment didn't work
- the person didn't understand the task
- the person performed at chance/floor/ceiling
- [...insert interesting cases...]

on these interesting cases, i have no comments at this time.

questions?

17/15

the Matlab/Octave code i used is here:
neurobiography.info/projects/outli…

I'll be happy to make any changes or clarifications or retractions when (proper) statisticians tell me i've got it wrong :-)

a post-hoc addition:

this is NOT a call to just switch from 2 to 3 SD & all is fine!

see various other threads & comments & papers for more nuance.

eg here's a paper in PNAS which removed outliers >3SD in so many different ways it's physically painful!

doi.org/10.1073/pnas.2…

if you liked this, you'll love my podcast 😁

theerrorbar.com

hello #stats #nerds!

after 24 hours, many comments & a long walk, some appendices:

1. these data only apply to between-participants considerations. an extra/different layer would be to do/not do outlier removal first on individual P's data, then look at effects on the group...

2. When I talked about 'false positives', it was about how your sample mean may or may not reflect the (true) population mean; eg, for IQ, after removing >2SD outliers, you would conclude ~10% of the time, that the population mean is NOT 100 (when it really is). it should be 5%!

3. the graphs I showed were only for the 'raw' data and the 'summary' data - I missed out perhaps the most important, intermediate, graphs 🤦- showing what each individual distribution looks like.

So, the next 3 graphs show some data BEFORE & AFTER removing >2SD outliers...

3A. Normal distribution, M=0, SD=1, N=20 per sample, 10K samples

from top: Means, SDs, t-scores, p-values
red line=expected (mean) value

see how the p-values become more likely 'significant' (at both lower and upper tails) 👀

3B. log-normal 'reaction time' distribution, M=690ms, SD=215ms, N=20 per sample, 10K samples

Same effects as in purely-normal data.

Note that even before removing any outliers, the distribution of means is not quite Normal, and removing 'outliers' just makes it worse...

3C. logistic 'proportion correct' distribution, M=.933, SD=.064, N=20 per sample, 10K samples

as above - non-normal distribution of means to start with, made much worse by removing 'outliers'!

(i also found a few relatively small errors / bugs in my code, so that's a bit cleaner now - same link as above)

[that's all folks]

I'm on a twitter and actual holiday now for two weeks - bye!

🚗+🏞️=😎

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Dr Nick Holmes 💚🌶💉💉

Try unrolling a thread yourself!

More from @TheHandLab

Dr Nick Holmes 💚🌶💉💉

Dr Nick Holmes 💚🌶💉💉

Did Thread Reader help you today?

Like this author's thread?