My Authors
Read all threads
The definitive guide to COVID-19 prognosis modeling success

1) Do not explain where the data come from (country) or when (study dates) they were obtained. Do not specify inclusion or exclusion criteria
2) Do not define a target group. Talk generically about COVID-19 patients, do not define how they were recruited
3) Do not provide a table with patient characteristics. In particular, do not mention use of medication or co-morbidities
4) Do not explicitly mention timing of predictor measurements. Predictor measurements closest to the outcome are most predictive. Do not define baseline or intended moment of prediction
5) Be generic in outcome description. Do not define time horizon of the prediction. Patients for whom the outcome was not determined (e.g. because still hospitalized) are best left out of the analyses
6) If the outcome is unbalanced, make sure to use classification accuracy as the performance measure of interest. It shows the model’s full potential
7) Combine as many predictor selection techniques as you can but don’t describe them in detail. It’s only the final model that counts
8) Randomly split your data into test and training set. If the stubborn model performs poorly, generate a new random split to make the model look better. Keep trying, and report it as “external validation”
9) Do not be specific about the sample size or the number of events. At all times, avoid stratified reporting of sample size and events when splitting data into training and test sets
10) Avoid large datasets. Large datasets are harder to overfit your way to an area under the ROC curve of 0.97
11) Do not separate model tuning from testing (internal validation). Separating the two leads to lower performance measures! Who wants that?
12) Dichotomize everything. Use separate classification cut-off values for determining sensitivity (low values) and specificity (high values)
13) Remove all records that are incomplete or hard to predict. They only make the model look bad
14) In small datasets use Hosmer-Lemeshow test to show the model is not significantly miscalibrated; in large data show that the calibration slope is equal to 1 (perfect) on the training data. Calibration solved!
15) Identify the most important predictors and give some biological explanation for their strong effects. Strong predictors from multivariable models almost *have* to be causally related to the outcome, right?
16) Ignore reporting guidelines. As a researcher you know not to let reporting ruin your storytelling!
17) Tell the audience that the model still requires external validation, but give a link to the web calculator just in case some already want to use it
18) Do not perform an external validation of one of the >100 already existing COVID-19 prediction model. That won’t make you famous
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Maarten van Smeden

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!