Hi everyone. It’s been awhile since I’ve had a good old-fashioned methods critique, but I'm feeling it today:
Because, whenever I see a paper that has “A Novel Risk Score…” in the title...
In this study, with 269 patients from a single center from 2002-2015, the authors “hypothesized that a risk score for operative mortality after acute type A repair could be rapidly generated from several of a panel of carefully selected, easily obtainable preoperative variables.”
As discussed in prior threads on abuse of logistic regression, this effort is simply doomed from the start. They have 269 patients with a total of 43 operative mortality events. There is no realistic chance of building a useful and reproducible risk score from this dataset.
The good Dr. Harrell frequently reminds us that 96 events are required for an accurate estimate of the INTERCEPT in a logistic regression model
TL;DR version of the above links: lots of mathiness (with good layperson's explanations!!) showing that it’s very hard to build accurate prediction models with small numbers of events
“Multiple logistic regression analysis was performed to determine independent significance, and linear regression was performed to generate the concomitant regression expression of the variables significant on bivariate analysis.”
This approach is used far too often, and it really needs to stop.
Variable selection based on statistical significance has a number of undesirable properties.
Here’s a neat little SAS user group paper that cites problems with variable selection procedures based on p-values from Frank Harrell’s Regression Modeling Strategies and extends them, then attempts to offer some better solutions: lexjansen.com/pnwsug/2008/Da…
As always, the classic “novel risk score” paper would be incomplete without an incorrect use of the word “multivariate” where they mean “multivariable”
As for the evaluating the risk score itself: the predictive performance is quite poor. This isn’t surprising, of course, because they’re trying to build a risk score on too small of a dataset for that to work, but still:
If you look at Figure 1 and then Figure 2…it looks like the risk stratification from preoperative Penn classification (by itself) is better than the risk stratification/prediction offered by the “risk score”
The sum total of this paper’s findings ought to be “Just use the preoperative Penn classification to estimate predicted risk”
Another hallmark of these papers: an ROC curve with no labels, which doesn’t tell us enough to use (putting aside critiques of sens/spec– if I see a combination of sens/spec that I think is good for use in practice, I need to know what the score threshold is!)
Also, on Figure 3: what is that line? Maybe I’m just missing something, but I honestly don’t know what that line represents.
I’ll give the authors a little credit for one thing: unlike some of these papers that I have criticized for such papers in the past, they claim to have implemented this risk score into their clinical practice (Figure 4)
That’s a nice idea, and given the limitations of single-center data, it’s understandable (they do have to make decisions, after all) but it must be acknowledged that the data in this paper are not at all sufficient to conclude that this algorithm leads to better outcomes
The authors do have a fairly extensive limitation section, but they missed the single most important limitation of this paper for the stated purpose: the sample size is FAR too small to derive an accurate & reproducible risk score
We must conclude with an insult to the reader’s intelligence by calling this an “innovative risk score”
Can we pleeeeeeease stop acting like “multivariable regression” = “innovative risk score”
I sincerely wish that the authors had, rather than pushing to create a "risk score" in a dataset that cannot do it, written this as a descriptive report on a large series and left it at that.
But, in the current system/environment, the likely response would be something like "People have published that before. To make this NOVEL we have to have a risk score."
And henceforth, another inaccurate & unlikely to reproduce "Novel Risk Score" has made its way into the literature. If you want to do some good, get a couple of centers together and pool enough data on this patient population to build a more accurate, generalizable score.
Or get @STS_CTsurgery to coordinate a large enough effort that this sort of procedure can be added to the STS portfolio of risk scores.
Missing some Tweet in this thread?
You can try to force a refresh.

# Like this thread? Get email updates or save it to PDF!

###### Subscribe to Andrew Althouse

Get real-time email alerts when new unrolls are available from this author!

###### This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

# Try unrolling a thread yourself!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" `@threadreaderapp unroll`