Karandeep Singh Profile picture
Jacobs Chancellor’s Endowed Chair @UCSanDiego @InnovationUCSDH. Chief Health AI Officer @UCSDHealth. Creator of @Tidierjl #JuliaLang. #GoBlue. Views own.
2 subscribers
Oct 12, 2023 5 tweets 1 min read
When an outcome influences a predictor, it’s “outcome leakage.” But what about when a predictor influences an outcome?

With @AkhilVaidMD @girish_nadkarni et al, we simulated what happens when a model predicts a bad outcome, but then you intervene to prevent that outcome. If you evaluate such a model *after* it has been linked to a clinical workflow, the model’s “apparent” performance will look worse.

People who were supposed to experience the outcome didn’t experience it (bc you prevented it!)

This matters most when interventions are effective.
Sep 30, 2023 4 tweets 2 min read
🧹Amazing progress on TidierPlots.jl by @randyboyes.

What’s new?

1. It looks *just* like ggplot2 now - nearly all macros converted to functions.

2. Thanks to #JuliaLang’s multiple dispatch, you can add plots together using `+` OR use pipes.

3. ggsave()

4. Works with Pluto.jl
Image
Image
TidierPlots.jl is getting to be crazily feature-complete, even supporting `geom_text()`, `geom_label()`, and faceting. Image
Apr 3, 2023 22 tweets 11 min read
Why does a proprietary sepsis model “work” at some hospitals but not others?

Is it generalizability? Measurement? Intervention? Patient population? Margin for improvement? Resource constraints?

Working with a team led by @_plyons, we looked at a 9-hospital network.

A story. In earlier single-center study @umichmedicine, our paper and accompanying editorial framed our AUC 0.63 as a failure of “external” validity. The result was somewhat surprising bc other studies reported higher AUCs/sens/spec.

Why?

jamanetwork.com/journals/jamai… jamanetwork.com/journals/jamai…
Apr 2, 2023 5 tweets 3 min read
🧹 Tidier.jl v0.7.1 is now on the #JuliaLang registry.

What’s new?

- drop_na()
- lag() and lead() - re-exported from ShiftedArrays.jl
- Bugfix to ntile() if all values are missing

Thanks to @KriseScheuch for feature suggestions!

github.com/kdpsingh/Tidie… ImageImage One interesting thing is that lag() and lead() take in a vector and return a vector (similar to ntile).

This means that these functions *should not* be auto-vectorized. So in addition to re-exporting, they are included on the package’s do-not-vectorize list.
Mar 18, 2023 9 tweets 6 min read
🧹Tidier.jl 0.6.0 is available on the #JuliaLang registry.

What’s new?

- New logo!
- distinct()
- n(), row_number() work *everywhere*
- `!` for negative selection
- pivoting functions are better
- bug fixes to mutate() and slice()

Docs: kdpsingh.github.io/Tidier.jl/dev/

A short tour. If you use distinct() without any arguments, it behaves just like the #rstats {tidyverse} distinct().

It checks if rows are unique, and returns all columns just as you would expect.
Feb 25, 2023 16 tweets 12 min read
A Visual Tour of the Meta-Tidyverse

For years, I’ve been trying out different non-tidyverse implementations of tidyverse. It’s fun seeing folks mold languages to run analysis code inspired by it.

If you like screenshots of code, come along for a visual tour.

Let’s start w/ R. If you thought that one tidyverse was enough for R, you would be wrong.

There are 2 fully independent re-implementations: {poorman} and {tidytable}.

{poorman} is powered by base R only - no dependencies! It’s a great pkg to use with binder/CI workflows.

cran.r-project.org/web/packages/p…
Feb 23, 2023 8 tweets 2 min read
If a tree falls in the forest but there’s no one around to hear it, does it really make a sound?

If a model detects a patient in need of ICU-level care but there are no ICU beds available, did the model really help the patient? When we link an intervention to a model threshold (eg alerts), we often worry about overalerting.

Overalerting can take on multiple forms. Either there are too many alerts bc many alerts are wrong. Or, there are too many alerts bc we lack capacity to act even if they are right.
Feb 22, 2023 5 tweets 4 min read
Why do seemingly useful models fail to improve clinical outcomes when implemented? Resource constraints.

In this paper, we describe constraints, how they affect net benefit, and how they apply to other measures.

Paper: academic.oup.com/jamia/advance-…

R pkg: github.com/ML4LHS/modelre… Image We use 4 case studies to show how a resource constraint diminishes the usefulness of a model and changes the optimal resource allocation strategy.

We show that some of the usefulness can be recouped by introducing a relative constraint (and relaxing the absolute constraint). ImageImage
Jan 31, 2023 33 tweets 9 min read
My lab is moving to #JuliaLang, and I’ll be putting together some R => Julia tips for our lab and others who are interested.

Here are a few starter facts. Feel free to tag along!

Julia draws inspiration from a number of languages, but the influence of R on Julia is clear. Let's start with packages.

Like R, Julia comes with a package manager that can be used to install pkgs from within the console (or REPL). The Pkg package isn't automatically imported in Julia but it's easy to do.

Both are different from Python's command line approach to pkgs.
Sep 29, 2022 11 tweets 5 min read
The new FDA guidance on CDS software is important but not for the reasons you might expect.

tl;dr: This document clarifies what the FDA *isn't going to regulate* and says little about *how* it's going to regulate CDS it considers to be a device.

Link: fda.gov/media/109618/d… Image While the FDA was established formally by the FD&C Act in 1938, it didn't gain the authority to regulate medical devices until 1976 when the FD&C Act was amended.

ncbi.nlm.nih.gov/pmc/articles/P… Image
Feb 22, 2022 10 tweets 4 min read
IMO, this is the *biggest* development in the R language since the pipe was first introduced.

To understand why, you need to know a bit about Flash, Java, JavaScript, LLVMs, Emscripten, and asm.js.

GH repo:
GH repo for R packages: github.com/georgestagg/we…
github.com/georgestagg/we…
When the web was first introduced, there wasn't a clear choice of what scripting language should be used, before the world settled on using JavaScript, which implements the ECMAScript specification (see here: ).

Should browsers bother running code?tc39.es/ecma262/
Feb 5, 2022 13 tweets 6 min read
A prediction model paradox in health: while many are developed, few are recommended.

But when models *are* recommended, they often come from tertiary care hospitals.

Is this a problem? We tested 3 prostate ca models in regional/ national data.

Paper: auajournals.org/doi/pdf/10.109… Image When people talk about risk stratifying cancer outcomes, there’s an implicit assumption that’s what being modeled is biology.

But whose biology? Patients who present to tertiary care centers are often more complex, and only some of that complexity is measurable.
Jul 14, 2021 10 tweets 5 min read
As more health systems adopt some form of clinical AI/ML governance, one of the biggest challenges they face is the monitoring of deployed models.

And one of the scariest phenomena that occurs in deployed models is dataset shift.

Why scary?
- Often silent
- Can lead to harm Why silent? Shouldn’t it be obvious if models get miscalibrated over time?

This is called “calibration drift” and may be clinically obvious if it occurs rapidly. However, it can occur gradually and be missed.

Also, calibration drift is only a *small* subset of dataset shift.
Jun 24, 2021 19 tweets 7 min read
Let's talk limitations.

One of Epic’s public criticisms is that our analysis was “hypothetical” and thus problematic. They’re not wrong about it being hypothetical, and there *are* some real limitations worth pointing out. Let’s dive into our paper’s limitations, big and small. Our biggest limitation is that our results come from a single center.

Tho I know of another center with a similar AUC to us using similar code, we did not combine into 1 paper bc their scoring started shortly before COVID, which introduces some complications. More to come...
Jun 22, 2021 30 tweets 16 min read
Thank you for folks who have shared or commented on our paper. I know the paper is being used by some to dunk on Epic. Rather than piling on, I want to provide a clear-eyed view of what we found, what it means, and what I would suggest to Epic (& other model devs) going forward. Here are some questions that come up:
- Are our findings due to a configuration error at @umichmedicine?
- Why do our findings differ from what Epic reports?
- Why does hospitalization-level AUC go from 0.63 to 0.80 in the sensitivity analysis?
- Are we using the model today?
Mar 16, 2021 15 tweets 6 min read
Now that we’ve discussed discrimination, calibration, and decision curve analysis (DCA), let’s talk about Scenario 2.

I am fascinated to know why folks felt that the model should not be used. Isn’t the model good?! AUC 0.75 and well-calibrated.

I agreed w/ majority. Here’s why. First, let’s poll folks who felt the model shouldn’t be used. What aspect of the model were you dissatisfied with?
Mar 15, 2021 8 tweets 3 min read
One property about net benefit on decision curves (as we found out below) is that the treat-all strategy has both an x-intercept and y-intercept at the same point as the proportion of patients experiencing the outcome.

Ignoring the x-axis for a moment, we will talk about y-axis. The maximal net benefit of a model in a given setting is determined by the proportion of people who experience the outcome.

If one hospital has a higher % of pts experiencing the outcome, the model will have a higher net benefit assuming all else is equal.

Most ppl get this.
Mar 14, 2021 4 tweets 1 min read
One conceptual question that comes up about decision curves and net benefit is: Don’t you need a clinical trial to establish benefit?

You do, but decision curve analysis is like a power calculation.

If your model is expected to confer negative benefit, why do a trial at all? So if the model has a positive net benefit, why do a trial?

Because the current standard of care isn’t quite treat-all or treat-none. It’s somewhere in between but we rarely have a window into how clinicians estimate risks in current practice, even if qualitative.
Mar 14, 2021 25 tweets 9 min read
Thanks everyone for your votes on each of the scenarios. In this thread, I’ll walk through Scenario 1 — both how I thought about it originally, and how decision curves can help.

Let’s get started.

Popular opinion was to use neither model. My vote was for Model B. Here’s why. Image I’ll get to why I voted for Model B but I’ll start in order and share everything I looked at to arrive at that opinion.

From the threshold-perf plot (TPP), we can tell that the outcome occurs in ~25% bc the PPV at a threshold of 0 (left), where all preds are positive, is 25%. Image
Mar 2, 2021 17 tweets 6 min read
I initially said that the reason I think ppl get confused about decision curve analysis (DCA) was because they don't think of the decision threshold as being connected to the cost/benefit ratio.

I'm realizing this is not a mathematical issue but a conceptual one.

A thread. One issue that touched a nerve was that in my example, I calculated a post-hoc threshold based on sensitivity. Was I wrong to do this? Let me give an example as to why this happened in this situation, why it *might've* been our only option, and how I would do it differently.
Feb 28, 2021 9 tweets 3 min read
Decision curves are a way to understand the population-level "net benefit" of implementing a prediction model *without* needing to *explicitly* account for actual costs and benefits.

Even experts seemingly don't understand it, and admit it!

How is this possible? Is it laziness? Image Lack of understanding isn't for a lack of trying on the part of its authors. There are dozens of papers w/ thousands of citations!

So what I can possibly add? Personal experience. I'll walk through why I was confused and how it finally clicked for me.

mskcc.org/departments/ep… Image