Yesterday @kph3k noted that I "loathe" decile analysis as a way of describing the results of a PGS analysis. The subsequent discussion clarified why I loathe it-- it's a misleading way of reporting results, a systematic sleight of hand to disguise the import of a small effect. /1
To illustrate, I created some simple data for 10,000 observations. PGS has a mean of 0 and a SD of 1; IQ has a mean of 100 and a standard deviation of 15. They are correlated around .205, so the PGS accounts for about 5% of the variance in IQ. /2
PGS analyses are very simple-- it's just a simple regression. How to illustrate the result? The obvious way is to just draw the scatterplot with the regression line through it. It is what it is. /3
But how to describe the import of the PGS, the difference it makes in real people's IQs? This is where decile analysis comes in. Why not report the predicted scores for individuals at the 10th and 90th percentiles? The results seem impressive! /4
Look at that: 5% of the variance may not seem like much, but the predictions for the 10th and 90th percentiles differ by 9 IQ points. But wait: all you are doing here is reporting the location of the regression line, ie, the MEAN predicted scores of people at the extremes. /5
This is giving yourself credit for your greatest source of certainty (the big sample) while ignoring the greatest source of uncertainty (the poor prediction of the PGS). It is answering the wrong question: it doesn't tell you how well the score would work in the real world /6
What you really want to know is the *prediction interval*, the uncertainty surrounding the prediction for a single new participant.The predictions are the same, but the intervals are way wider, because they take into account how badly the PGS actually works. /7
This is a much more sobering result. For someone at the 10th percentile of the PS, you can predict with 95% confidence that their IQ score will be between 67 and 124; for someone at the 90th %, you can predict it will be between 76 and 133. The big sample is no help. /8
Bottom line: decile predictions in the absence of prediction error is a QRP, part of an unintentional but systematic program of sweeping the biggest problem of human behavioral genomics-- tiny effect sizes-- under the methodological rug. Smart researchers should cut it out. /end
/ps. Another thing about decile analyses is that they are generally presented as though they were showing you something special about the data. "The R2 may be small, but look-- there is something interesting at the extremes of our data."
/ps2 I have never seen one where that is actually the case. Mean effects at the extreme are just a general property of small correlations. Researchers should be clear that they are special pleading that their small correlation is important anyway.
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
