Tweet

Lauren Oakden-Rayner

Apr 6 • 9 tweets • 7 min read

Very excited to have 2 new papers in press today in Lancet Digital Health, alongside an editorial from the journal highlighting our work.

I am immensely proud of the work we have done here and honestly think this is the most important work I have been involved in to date 🥳

1/7

#Medical #AI has a problem. Preclinical testing, including regulatory testing, does not accurately predict the risks that AI models pose once they are deployed in clinics.

I've written about this before in my blog, for example in:

google.com/amp/s/laurenoa…

2/7

@theAIML

In this work we:

1) describe a step by step method for algorithmic auditing in health, building on the 🔥 work by Raji et al

2) audit a high accuracy model we developed @theAIML for hip fracture dx, ID-ing several serious risks that were not detected by standard testing.

3/7

The high performance hip fracture model (AUC 0.994 vs 0.969 for radiologists) fails unexpectedly on an extremely obvious fracture and produces a cluster of errors in cases with abnormal bones (Paget's disease etc).

These findings (and risks) were only detected via audit.

4/7

@RANZCRcollege

We are excited that this work is impacting policy. Professional orgs such as @RANZCRcollege are incorporating audit into their practice standards (ie ranzcr.com/college/docume…) and we are talking with regulators and governance groups on how audit can make AI systems safer.

5/7

I'll leave it at that for now (although expect a blog post in the near future 😂), so I'll just leave the links here:

Editorial: sciencedirect.com/science/articl…

Medical algorithmic audit: sciencedirect.com/science/articl…

Hip fractures: sciencedirect.com/science/articl…

6/7

@DrXiaoLiu

Shoutouts:

Audit: @DrXiaoLiu @Denniston_Ophth in clinical AI governance, @MarzyehGhassemi @GlockerBen in ML4H, @MMccradden in AI bioethics.

Hips:
William Gale, @ghcarneiro @ApiBradley @PalmerLyle from Aus, and @mattlungrenMD and Thomas Bonham from @StanfordAIMI

7/7

@Denniston_Ophth

Final celebrations/boasting (😂)

The audit paper is my first senior (last) author publication (co-seniored with the amazing @Denniston_Ophth), and both papers have been published under my new name!

They are also the final papers of my PhD, which is now completed!

🥳🥳🥳

8/7

@rajiinio

Urrgh I'm sorry, I don't know how I didn't link @rajiinio's profile here. If you don't know her, check her out, she does incredible work!

Also the other authors on the algorithmic audit paper are awesome too!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @DrLaurenOR

Lauren Oakden-Rayner

@DrLaurenOR

Aug 2, 2021

#Medical #AI has the worst superpower... Racism

We've put out a preprint reporting concerning findings. AI can do something humans can't: recognise the self-reported race of patients on x-rays. This gives AI a path to produce health disparities.

1/8

lukeoakdenrayner.wordpress.com/2021/08/02/ai-…

This is a big deal, so we wanted to do it right. We did dozens of experiments, replication at multiple labs, on numerous datasets and tasks.

We are releasing all the code, as well as new labels to identify racial identity for multiple public datasets.

2/8

Humans can't detect race better than chance, but AI performs absurdly well on the task. As you can see here, AUC scores are in the high 90s, and are maintained on external validation on completely distinct datasets and across multiple different imaging tasks.

3/8

Read 10 tweets

Lauren Oakden-Rayner

@DrLaurenOR

Dec 8, 2020

Docs are ROCs: A simple fix for a methodologically indefensible practice in medical AI studies.

Widely used methods to compare doctors to #AI models systematically underestimate doctors, making the AI look better than it is! We propose a solution.

lukeoakdenrayner.wordpress.com/2020/12/08/doc…

1/7

The most common method to estimate average human performance in #medical AI is to average sensitivity and specificity as if they are independent. They aren't though - they are inversely correlated on a curve.

The average points will *always* be inside the curve.

2/7

The only solution currently is to force doctors to rate images using confidence scores. While this works well in the few tasks where these scales are used in clinical practice, what does it mean to say you are 6/10 confident that there is a lung nodule?

3/7

Read 8 tweets

Lauren Oakden-Rayner

@DrLaurenOR

Aug 19, 2020

https://twitter.com/VickersBiostats/status/1295489610139738112

Alright, let's do this once last time. Predictions vs probabilities. What should we give doctors when we use #AI / #ML models for decision making or decision support?

#epitwitter

1/21

https://twitter.com/VickersBiostats/status/1295489610139738112

First, we need to ask: is there a difference?

This is a weird question, right? Of course there is! One is a categorical class prediction, the other is a continuous variable. Stats 101, amirite?

Well, no.

2/21

Let's set out the two ways that probabilities are supposed to be different than class predictions.

1) they are continuous, not categorical
2) they are probabilities, meaning the numbers reflects some truth about a patient group and are not arbitrary

Weeeeell...

3/21

Read 23 tweets

Lauren Oakden-Rayner

@DrLaurenOR

Jul 28, 2020

https://twitter.com/laure_wynants/status/1288131085797294080

This discussion was getting long, so I thought I'd lay out my thoughts on a common argument: should models produce probabilities or decisions? Ie 32% chance of cancer vs "do a biopsy".

I favour the latter, because IMO it is both more useful and... more honest. IMO:

1/13

https://twitter.com/laure_wynants/status/1288131085797294080

The argument against using a threshold to determine an action, at a basic level, seems to be:

1) you shouldn't discard information by turning a range of probabilities into a binary
2) probabilities are more useful at the clinical coalface

2/13

Re: 1.

No model discards information. The continuous output score always exists. It is how you make use of that information at point of care that "changes".

I use airquotes around "changes", because this is a ... false dichotomy 😆

3/13

Read 14 tweets

Lauren Oakden-Rayner

@DrLaurenOR

Mar 3, 2020

https://twitter.com/pranavrajpurkar/status/1234772132514553856

Great work showing that a good AI system doesn't always help doctors.

Echoes the decades of experience with radCAD: when the system is wrong, it biases the doctor and makes them *worse* (OR 0.33!) at diagnosis.

It is *never* as simple as AI+doctor is better than doctor alone.

https://twitter.com/pranavrajpurkar/status/1234772132514553856

I personally suspect the biggest problem is automation bias, which is where the human over-relies on the model output.

Similar to self driving cars where jumping to complete automation appears to be safer than partial automation.

But interestingly (and perhaps counter-intuitively) this could also mean that "blind" ensembling (where the human gets no AI input, and the human and AI opinions are combined algorithmically) might be better than showing the doctor what the AI thinks.

Read 6 tweets

Lauren Oakden-Rayner

@DrLaurenOR

Nov 26, 2019

@weina_jin

#Medical #AI researchers: badly performed/described cross-validation is the most common reason I recommend major revisions as a reviewer.

CV can be used to tune models and to estimate performance, but not on the same data. See this diagram for doing both.

h/t 4 pic @weina_jin

@weina_jin

@weina_jin The weird thing about CV in AI is that you don't actually end up with a single model. You end up with k different models and sets of hyperparameters.

It allows an estimate of generalisation for a *group* of models, but that is still a step removed from a deployable system.

@weina_jin

@weina_jin For a more detailed explanation, see the "Nested cross-validation for model assessment" section of: ncbi.nlm.nih.gov/pmc/articles/P…

and here is the blog post from @weina_jin that reminded me to tweet about this topic weina.me/nested-cross-v…

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Lauren Oakden-Rayner

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @DrLaurenOR

Lauren Oakden-Rayner

Lauren Oakden-Rayner

Lauren Oakden-Rayner

Lauren Oakden-Rayner

Lauren Oakden-Rayner

Lauren Oakden-Rayner

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?