Proprietary prediction models are widely implemented in health care. Let’s talk about why they exist and if we can (or should) move away from them.

Let's start with a poll. I'll return to this soon with a story about the slow death of an open model.

Why are they used at all?
All are partially true (will explain) but C is right. Proprietary models are used bc EHRs suffer from a last-mile problem. While scientists debate whether models should even be made available (nature.com/articles/s4158…) the truth is that we don’t have many ways to implement models.
From a technical standpoint, formats *do exist* to facilitate sharing of models (pubmed.ncbi.nlm.nih.gov/33079583/ & nature.com/articles/s4158…) but this is moot bc those formats are (mostly) not supported inside the EHR.

(Check out the @nature letter to @Google informing them about Colab)
So, the reason that hundreds of US hospitals use proprietary prediction models today has to do with the fact that these are EHR vendor-developed models and are thus easiest to implement in the EHR itself.

But I said all of the answers were partially right. How is that possible?
It's worth considering what a proprietary model actually is.

Which definition best captures the characteristics of a proprietary model?

Is it a model whose...

...variables are not known?
...form/coefficients are not known?
...performance is not known
...can't be used w/o $?
D is most correct but others can be true. EHR vendors usually provide information on which variables (+/- actual coefficients) and info on model validation performed at other institutions. If implemented, vendors will even calculate local performance.

But can you trust it?
Having read a dozen+ proprietary model briefs, the quality of validation (and assumptions) are highly variable. Also, some vendors are more aggressive than others re subjecting models to peer review. But vendors do privately share validation info with hospitals.

Now for a story.
Let's talk about APACHE, a series of models that help ICUs assess whether their ICU mortality is better or worse than expected based on patient severity. The story comes straight from its developer (Dr. William Knaus).

1. jamanetwork.com/journals/jamas…
2. mdcalc.com/apache-ii-score ImageImage
APACHE was invented in 1978 in response to the unexpected death of a young patient. APACHE I was developed in 582 pts and published in 1981. It was tested in France, Spain, and Finland. It was somewhat complex (requiring 33 physiologic measurements), which limited adoption. Image
APACHE II reduced the complexity of APACHE (used only 12 physiologic measurements) and adoption was rapid.

The system worked well and was an open model.

You can try it out here: mdcalc.com/apache-ii-score

... then the problems began. Image
The 1st problem was that carrying out this international effort to standardize quality measurement was expensive. A company was formed and $ was raised from venture capital.

The 2nd problem was that poor performers doubted the accuracy of APACHE II.

Solution? APACHE III. Image
APACHE III improved the AUC from 0.86 (APACHE II) to 0.90. It also addressed issues specific to surgery, trauma, comatose status, etc.

But unlike APACHE II, APACHE III was proprietary.

And it cost money, which led to an investigation re: misuse of funds. Image
Many ICU physicians were also not pleased with the prospect of paying for the score.

When told about the cost required to run the company and calculate the scores, Dr. Knauss was told to go "get more grants," which wasn't really an option.

...then APACHE got bought by Cerner. Image
Cerner is one of the two largest EHR vendors in ths US (alongside Epic). Since ICUs generally found APACHE III useful but didn't want to pay for it, it seems ideal that they got bailed out by Cerner, right?

Kind of like how Microsoft bailed out GitHub?

...so what did Cerner do? Image
Cerner unveiled.... *drumroll*

APACHE IV!

Features:
- better calibrated than APACHE III
- more complex than APACHE II/III

"Also we recommend APACHE II no longer be used..."

So how complex was it? ImageImage
APACHE IV is so complex that centers often perform manual chart validation to confirm that the elements going into the model are accurate.

Source: journals.lww.com/ccmjournal/Ful…

Also, conveniently, APACHE IV isn't integrated with the Epic EHR (hmm wonder why?) Image
Meanwhile, in non-proprietary land, the SAPS-3 model tried to resurrect the simplicity of APACHE II -> simple, but not as good (ncbi.nlm.nih.gov/pmc/articles/P…)

Also, Epic introduced a proprietary ICU mortality prediction model that appears to emulate APACHE IV and is easy to integrate. Image
So which would you use?
- a complex proprietary model owned by Cerner (APACHE IV)
- a simple prediction model (MPM-3) also owned by Cerner
- the proprietary Epic ICU mortality model
- the non-proprietary SAPS-3 (performed worse in an independent validation)
- outdated APACHE II
Dr. Knauss, inventor of APACHE, has this to say in a footnote on the MDCalc page for APACHE II (mdcalc.com/apache-ii-score)

"In retrospect, if we had known the future was going to be as limited in the development of health IT, I think we would've said, let's stay with APACHE II." Image
If you work in an ICU, I'd love to know:

What does your ICU actually use to measure how well it is doing in terms of expected vs. observed mortality?
So what's the moral of the story?

Proprietary models are here to stay (for now), but we need to urgently adopt mechanisms to disseminate and operationalize open-source models in the EHR. This is available in some EHRs but not all. And it's competely different for each EHR.
Closing thoughts (1/2): We can d/l our patient records today bc of the Blue Button and @myopennotes initiatives.

@calonghurst @drnigam proposed a "Green Button" initiative to get aggregate patient statistics at the bedside (healthaffairs.org/doi/full/10.13…), an important next step.
Closing thoughts (2/2): I'll go further and say that we need an OpenModel initiative that allows prediction models to interface in a constent manner with all EHRs. Not just PMML (model format) but communication stds.

Without it, the future consists mostly of proprietary models.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Karandeep Singh

Karandeep Singh Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @kdpsinghlab

17 Oct 20
I had the opportunity to work with @AndrewLBeam and @bnallamo on an editorial sharing views on the reporting of ML models in this month’s @CircOutcomes. First, read the guidelines by Stevens et al. Our editorial addresses what they can and can’t fix.

ahajournals.org/doi/10.1161/CI… Image
First, everyone (including Stevens et al.) acknowledges that larger efforts to address this problem are underway, including the @TRIPODStatement’s TRIPOD-ML and CONSORT-AI for trials.

The elephant in the room is: What is clinical ML and what is not? (an age old debate) Image
What’s in a name clearly matters. Calling something “statistics” vs “ML” affects how the work is viewed (esp by clinical readers) and (unfairly) influences editorial decisions.

How can readers focus on the methods when they can’t assign a clear taxonomy to the methods? Image
Read 11 tweets
28 Aug 20
The presenter puts up a slide showing “random forest variable importance.” You know the one...

The sideways bar plot.

Says “only showing the top 20 variables here...” to highlight the hi-dimensional power of random forests.

The slide is awkwardly wide-screen. Everyone squints.
A clinician in the front row exclaims, “Wow, that makes so much sense!”

Silence.

Then someone asks, “What do the length of the bars mean?”

The presenter starts to answer when someone else butts in, “Does the fact that they are pointing the same direction mean anything?”
The audience stares at the presenter expectantly. There will be a deep explanation.

“The bar length is relative... the amplitude doesn’t have any interpretable meaning. But look at the top 3 variables. Ain’t that something?”

The clinicians exhale and whisper in reverence.
Read 48 tweets
24 Jul 20
Since my TL is filled with love letters to regression, let's talk about the beauty of random forests. Now maybe you don't like random forests are or don't use them or are afraid of using them due to copyright infringement.

Let's play a game of: It's Just a Random Forest (IJARF).
Decision trees: random forest or not?

Definitely a single-tree random forest with mtry set to the number of predictors and pruning enabled.

IJARF.
Boosted decision trees: random forest or not?

Well, if you weight the random forest trees differently, keep their depth shallow, maximize mtry, and grow them sequentially to minimize residual error, then a GBDT is just a type of random forest.

IJARF.
Read 6 tweets
29 Apr 20
PREPRINT: "Validating a Widely Implemented Deterioration Index Model Among Hospitalized COVID-19 Patients"

Manuscript: medrxiv.org/content/10.110…

Code: github.com/ml4lhs/edi_val…

Why did we do this? What does it mean? Is the Epic deterioration index useful in COVID-19? (Thread)
Shortly after the first @umichmedicine COVID-19 patient was admitted in March, we saw rapid growth in the # of admitted patients with COVID-19. A COVID-19-specific unit was opened (uofmhealth.org/news/archive/2…) and projections of hospital capacity looked dire:
.@umichmedicine was considering opening a field hospital (michigandaily.com/section/news-b…) and a very real question arose of how we would make decisions about which COVID-19 patients would be appropriate for transfer to a field hospital. Ideally, such patients would not require ICU care.
Read 40 tweets
21 Oct 19
I’ll be giving a talk on implementing predictive models at @HDAA_Official on Oct 23 in Ann Arbor. Here’s the Twitter version.

Model developers have been taught to carefully think thru development/validation/calibration. This talk is not about that. It’s about what comes after...
But before we move onto implementation, let’s think thru what model discrimination and calibration are:

- discrimination: how well can you distinguish higher from lower risk people?

- calibrations: how close are the predicted probabilities to reality?

... with that in mind ...
Which of the following statements is true?

A. It’s possible to have good discrimination but poor calibration.

B. It’s possible to have good calibration but poor discrimination.
Read 26 tweets
24 Sep 19
The DeepMind team (now “Google Health”) developed a model to “continuously predict” AKI within a 48-hr window with an AUC of 92% in a VA population, published in @nature.

Did DeepMind do the impossible? What can we learn from this? A step-by-step guide.

nature.com/articles/s4158…
To understand this work’s contribution, it’s first useful to know what was previously the state of art. I would point to two of @jaykoyner’s papers.

cjasn.asnjournals.org/content/11/11/… and
insights.ovid.com/crossref?an=00…

The 2016 @CJASN paper used logistic regression and 2018 paper used GBMs.
The 2016 CJASN paper is particularly relevant because it was also modeled on a national VA population. Altho the two papers used different modeling approaches, one key similarity is in how the data are prepared: using a discrete time survival method.

What the heck is that?
Read 35 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!