If anyone didn't get what I meant when I said @ykilcher chose to "kick the hornets nest" or if anyone was wondering about the cost of speaking out against unethical behaviour in #AI, here's a little summary of my recent twitter feed.
Here's some more. If anyone doesn't understand why all these statements are explicitly transphobic ... well, it is because you don't face it. These are all extremely hurtful.
That's enough, but I've skipped all the misogyny, racism, anti-semitism etc.
This isn't isolated. Another commentator on this stunt has needed to take some time off Twitter due to the reaction.
I honestly spent several days deciding to post on this, because I knew what would happen. It was important, but there is always a cost.
This behaviour and response was predictable. An unbroken causal chain from Yannic to me.
Which is perhaps unsurprising, given that he doesn't seem to think that hate-speech is harmful, that it is "just insults", and "something you simply don't like".
All things an ethics board shouldn't have to consider, but might have needed to.
"You are trolling an extremely toxic community and are likely to get publicly criticised on ethical grounds. Have you considered the risk you may mobilise an army of bigots against your colleagues?"
This tweet really resonated with me. It's always people from minoritised groups that need to perform "non-consensual maid-ery" and face the repercussions.
This week an #AI model was released on @huggingface that produces harmful + discriminatory text and has already posted over 30k vile comments online (says it's author).
This experiment would never pass a human research #ethics board. Here are my recommendations.
@huggingface as the model custodian (an interesting new concept) should implement an #ethics review process to determine the harm hosted models may cause, and gate harmful models behind approval/usage agreements.
Medical research has functional models ie for data sharing.
2/7
Open science and software are wonderful principles but must be balanced against potential harm. Medical research has a strong ethics culture bc we have an awful history of causing harm to people, usually from disempowered groups.
Very excited to have 2 new papers in press today in Lancet Digital Health, alongside an editorial from the journal highlighting our work.
I am immensely proud of the work we have done here and honestly think this is the most important work I have been involved in to date 🥳
1/7
#Medical#AI has a problem. Preclinical testing, including regulatory testing, does not accurately predict the risks that AI models pose once they are deployed in clinics.
I've written about this before in my blog, for example in:
We've put out a preprint reporting concerning findings. AI can do something humans can't: recognise the self-reported race of patients on x-rays. This gives AI a path to produce health disparities.
This is a big deal, so we wanted to do it right. We did dozens of experiments, replication at multiple labs, on numerous datasets and tasks.
We are releasing all the code, as well as new labels to identify racial identity for multiple public datasets.
2/8
Humans can't detect race better than chance, but AI performs absurdly well on the task. As you can see here, AUC scores are in the high 90s, and are maintained on external validation on completely distinct datasets and across multiple different imaging tasks.
Docs are ROCs: A simple fix for a methodologically indefensible practice in medical AI studies.
Widely used methods to compare doctors to #AI models systematically underestimate doctors, making the AI look better than it is! We propose a solution.
The most common method to estimate average human performance in #medical AI is to average sensitivity and specificity as if they are independent. They aren't though - they are inversely correlated on a curve.
The average points will *always* be inside the curve.
2/7
The only solution currently is to force doctors to rate images using confidence scores. While this works well in the few tasks where these scales are used in clinical practice, what does it mean to say you are 6/10 confident that there is a lung nodule?
Alright, let's do this once last time. Predictions vs probabilities. What should we give doctors when we use #AI / #ML models for decision making or decision support?
This discussion was getting long, so I thought I'd lay out my thoughts on a common argument: should models produce probabilities or decisions? Ie 32% chance of cancer vs "do a biopsy".
I favour the latter, because IMO it is both more useful and... more honest. IMO: