Lauren Oakden-Rayner 🏳️‍⚧️ Profile picture
Medical AI safety. Director of Research @ nRAH Medical Imaging. Senior Research Fellow, Australian Institute for Machine Learning. She/her 🏳️‍⚧️🏳️‍🌈
2 subscribers
Jun 9, 2022 6 tweets 4 min read
If anyone didn't get what I meant when I said @ykilcher chose to "kick the hornets nest" or if anyone was wondering about the cost of speaking out against unethical behaviour in #AI, here's a little summary of my recent twitter feed.

CW: transphobia

ImageImageImageImage Here's some more. If anyone doesn't understand why all these statements are explicitly transphobic ... well, it is because you don't face it. These are all extremely hurtful.

That's enough, but I've skipped all the misogyny, racism, anti-semitism etc. ImageImageImageImage
Jun 6, 2022 14 tweets 8 min read
This week an #AI model was released on @huggingface that produces harmful + discriminatory text and has already posted over 30k vile comments online (says it's author).

This experiment would never pass a human research #ethics board. Here are my recommendations.

1/7 I agree with KCramer. There...Text from huggingface discu... @huggingface as the model custodian (an interesting new concept) should implement an #ethics review process to determine the harm hosted models may cause, and gate harmful models behind approval/usage agreements.

Medical research has functional models ie for data sharing.

2/7 ImageImage
Apr 6, 2022 9 tweets 7 min read
Very excited to have 2 new papers in press today in Lancet Digital Health, alongside an editorial from the journal highlighting our work.

I am immensely proud of the work we have done here and honestly think this is the most important work I have been involved in to date 🥳

1/7 #Medical #AI has a problem. Preclinical testing, including regulatory testing, does not accurately predict the risks that AI models pose once they are deployed in clinics.

I've written about this before in my blog, for example in:

google.com/amp/s/laurenoa…

2/7
Aug 2, 2021 10 tweets 4 min read
#Medical #AI has the worst superpower... Racism

We've put out a preprint reporting concerning findings. AI can do something humans can't: recognise the self-reported race of patients on x-rays. This gives AI a path to produce health disparities.

1/8

lukeoakdenrayner.wordpress.com/2021/08/02/ai-… This is a big deal, so we wanted to do it right. We did dozens of experiments, replication at multiple labs, on numerous datasets and tasks.

We are releasing all the code, as well as new labels to identify racial identity for multiple public datasets.

2/8
Dec 8, 2020 8 tweets 4 min read
Docs are ROCs: A simple fix for a methodologically indefensible practice in medical AI studies.

Widely used methods to compare doctors to #AI models systematically underestimate doctors, making the AI look better than it is! We propose a solution.

lukeoakdenrayner.wordpress.com/2020/12/08/doc…

1/7 The most common method to estimate average human performance in #medical AI is to average sensitivity and specificity as if they are independent. They aren't though - they are inversely correlated on a curve.

The average points will *always* be inside the curve.

2/7
Aug 19, 2020 23 tweets 8 min read
Alright, let's do this once last time. Predictions vs probabilities. What should we give doctors when we use #AI / #ML models for decision making or decision support?

#epitwitter

1/21 First, we need to ask: is there a difference?

This is a weird question, right? Of course there is! One is a categorical class prediction, the other is a continuous variable. Stats 101, amirite?

Well, no.

2/21
Jul 28, 2020 14 tweets 4 min read
This discussion was getting long, so I thought I'd lay out my thoughts on a common argument: should models produce probabilities or decisions? Ie 32% chance of cancer vs "do a biopsy".

I favour the latter, because IMO it is both more useful and... more honest. IMO:

1/13 The argument against using a threshold to determine an action, at a basic level, seems to be:

1) you shouldn't discard information by turning a range of probabilities into a binary
2) probabilities are more useful at the clinical coalface

2/13
Mar 3, 2020 6 tweets 2 min read
Great work showing that a good AI system doesn't always help doctors.

Echoes the decades of experience with radCAD: when the system is wrong, it biases the doctor and makes them *worse* (OR 0.33!) at diagnosis.

It is *never* as simple as AI+doctor is better than doctor alone. I personally suspect the biggest problem is automation bias, which is where the human over-relies on the model output.

Similar to self driving cars where jumping to complete automation appears to be safer than partial automation.
Nov 26, 2019 5 tweets 4 min read
#Medical #AI researchers: badly performed/described cross-validation is the most common reason I recommend major revisions as a reviewer.

CV can be used to tune models and to estimate performance, but not on the same data. See this diagram for doing both.

h/t 4 pic @weina_jin @weina_jin The weird thing about CV in AI is that you don't actually end up with a single model. You end up with k different models and sets of hyperparameters.

It allows an estimate of generalisation for a *group* of models, but that is still a step removed from a deployable system.
Sep 10, 2019 14 tweets 3 min read
1/ While this will play well (and get cited a lot) among the anti-#deeplearning holdouts, I was left a bit underwhelmed. I wanted to find some interesting edge cases where DL is not working (so we can work out solutions), but instead got a set of pretty unreasonable comparisons 2/ The deep learning models are tiny (4 conv layers) with justification that it works for MNIST. Everything works for MNIST! Linear regression works for MNIST!

xiaoliangbai.com/2017/02/01/ten…

We know in complex images deeper and more complex is vastly better, and does less overfitting!
Dec 18, 2018 18 tweets 5 min read
Well, here is the 6 months later follow up on @Annals_Oncology paper by Haenssle et al, "Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists."

The paper claims "Most dermatologists were outperformed by the CNN", a bold statement. The relevant part of the paper is pictured.