Tweet

@PalmerLyle

More from @DrLukeOR

Luke Oakden-Rayner

@DrLukeOR

19 Aug

https://twitter.com/VickersBiostats/status/1295489610139738112

Alright, let's do this once last time. Predictions vs probabilities. What should we give doctors when we use #AI / #ML models for decision making or decision support?

#epitwitter

1/21

https://twitter.com/VickersBiostats/status/1295489610139738112

First, we need to ask: is there a difference?

This is a weird question, right? Of course there is! One is a categorical class prediction, the other is a continuous variable. Stats 101, amirite?

Well, no.

2/21

Let's set out the two ways that probabilities are supposed to be different than class predictions.

1) they are continuous, not categorical
2) they are probabilities, meaning the numbers reflects some truth about a patient group and are not arbitrary

Weeeeell...

3/21

Read 23 tweets

Luke Oakden-Rayner

@DrLukeOR

28 Jul

https://twitter.com/laure_wynants/status/1288131085797294080

This discussion was getting long, so I thought I'd lay out my thoughts on a common argument: should models produce probabilities or decisions? Ie 32% chance of cancer vs "do a biopsy".

I favour the latter, because IMO it is both more useful and... more honest. IMO:

1/13

https://twitter.com/laure_wynants/status/1288131085797294080

The argument against using a threshold to determine an action, at a basic level, seems to be:

1) you shouldn't discard information by turning a range of probabilities into a binary
2) probabilities are more useful at the clinical coalface

2/13

Re: 1.

No model discards information. The continuous output score always exists. It is how you make use of that information at point of care that "changes".

I use airquotes around "changes", because this is a ... false dichotomy 😆

3/13

Read 14 tweets

Luke Oakden-Rayner

@DrLukeOR

3 Mar

https://twitter.com/pranavrajpurkar/status/1234772132514553856

Great work showing that a good AI system doesn't always help doctors.

Echoes the decades of experience with radCAD: when the system is wrong, it biases the doctor and makes them *worse* (OR 0.33!) at diagnosis.

It is *never* as simple as AI+doctor is better than doctor alone.

https://twitter.com/pranavrajpurkar/status/1234772132514553856

I personally suspect the biggest problem is automation bias, which is where the human over-relies on the model output.

Similar to self driving cars where jumping to complete automation appears to be safer than partial automation.

But interestingly (and perhaps counter-intuitively) this could also mean that "blind" ensembling (where the human gets no AI input, and the human and AI opinions are combined algorithmically) might be better than showing the doctor what the AI thinks.

Read 6 tweets

Luke Oakden-Rayner

@DrLukeOR

26 Nov 19

@weina_jin

#Medical #AI researchers: badly performed/described cross-validation is the most common reason I recommend major revisions as a reviewer.

CV can be used to tune models and to estimate performance, but not on the same data. See this diagram for doing both.

h/t 4 pic @weina_jin

@weina_jin

@weina_jin The weird thing about CV in AI is that you don't actually end up with a single model. You end up with k different models and sets of hyperparameters.

It allows an estimate of generalisation for a *group* of models, but that is still a step removed from a deployable system.

@weina_jin

@weina_jin For a more detailed explanation, see the "Nested cross-validation for model assessment" section of: ncbi.nlm.nih.gov/pmc/articles/P…

and here is the blog post from @weina_jin that reminded me to tweet about this topic weina.me/nested-cross-v…

Read 5 tweets

Luke Oakden-Rayner

@DrLukeOR

10 Sep 19

https://twitter.com/ten_photos/status/1170732067887484928

1/ While this will play well (and get cited a lot) among the anti-#deeplearning holdouts, I was left a bit underwhelmed. I wanted to find some interesting edge cases where DL is not working (so we can work out solutions), but instead got a set of pretty unreasonable comparisons

https://twitter.com/ten_photos/status/1170732067887484928

2/ The deep learning models are tiny (4 conv layers) with justification that it works for MNIST. Everything works for MNIST! Linear regression works for MNIST!

xiaoliangbai.com/2017/02/01/ten…

We know in complex images deeper and more complex is vastly better, and does less overfitting!

3/ The linear and non-deep models are not "apples to apples" either though. This isn't deep learning vs simple models, it is deep learning vs incredibly complex feature engineering built up over decades of research.

Read 14 tweets

Luke Oakden-Rayner

@DrLukeOR

18 Dec 18

@Annals_Oncology

Well, here is the 6 months later follow up on @Annals_Oncology paper by Haenssle et al, "Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists."

https://twitter.com/DrLukeOR/status/1006804796785950720

The paper claims "Most dermatologists were outperformed by the CNN", a bold statement. The relevant part of the paper is pictured.

I raised several concerns in those tweets:

1) they compared two different metrics (ROC-AUC vs ROC area) as if they were the same
2) they used average human performance
3) they seemed to cheat when picking an operating point for the model

Each biases in favour of the model.

Read 18 tweets

Share this page!

Luke Oakden-Rayner

Try unrolling a thread yourself!

More from @DrLukeOR

Luke Oakden-Rayner

Luke Oakden-Rayner

Luke Oakden-Rayner

Luke Oakden-Rayner

Luke Oakden-Rayner

Luke Oakden-Rayner

Did Thread Reader help you today?

Like this author's thread?