The news headlines *undersold* this paper. Widely-used machine learning tool for sepsis prediction found to have an AUC of 0.63 (!), adds little to existing clinical practice. Misses two thirds of sepsis cases, overwhelms physicians with false alerts. jamanetwork.com/journals/jamai…
This adds to the growing body of evidence that machine learning isn't good at true prediction tasks as opposed to "prediction" tasks like image classification that are actually perception tasks.
Worse, in prediction tasks it's extremely easy to be overoptimistic about accuracy through careless problem framing. The sepsis paper found that the measured AUC is highly sensitive to how early the prediction is made—it can be accurate, or clinically useful, but not both.
Fortunately in medicine there's a well established set of methods for testing if something works, so the BS eventually gets called out. I worry much more about domains like résumé screening where the accuracy claims of ML-based prediction tools are never effectively challenged.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Arvind Narayanan

Arvind Narayanan Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @random_walker

22 Jun
Academia rewards clever papers over real world impact. That makes it less useful. But it also perpetuates privilege—those with less experience of injustice find it easier to play the game, i.e. work on abstruse problems while ignoring topics that address pressing needs.
I have no beef with fundamental research (which isn't motivated by applications). But most scholarship that *claims* to be motivated by societal needs happens with little awareness of what those needs actually are, and no attempt to step outside academia to actually make change.
Like many of academia's problems, this one is structural. Telling individual scholars to do better is unlikely to work when the incentives are all messed up. Here are some thoughts on what might work. I'd love to hear more.
Read 8 tweets
21 Jun
A student who's starting grad school asked me which topics in my field are under-explored. An important question! But not all researchers in a community will agree on the answers. If they did, those topics won't stay under-explored for long. So how to pick problems? [Thread]
It's helpful for researchers to develop a "taste" for problems based on their specific skills, preferences, and hypotheses about systemic biases in the research community that create blind spots. I shared two of my hypotheses with the student, but we must each develop our own.
Hypothesis 1: interdisciplinary topics are under-explored because it requires researchers to leave their comfort zones. But collaboration is a learnable skill, so if one can get better at it and find suitable collaborators, rich and important research directions await.
Read 6 tweets
20 Jun
I often find myself re-reading this short piece about what peer review was like in the 1860s. A reviewer was someone who helped improve a paper through a collegial, interactive process rather than rejecting it with a withering, anonymous comment. physicstoday.scitation.org/do/10.1063/PT.…
The great benefit of the more formalized system we have today is that it is more impartial, and has helped turn science into less of an old boys' network. But it is also clear that something has been lost.
The problem with reducing bias by formalizing the review process is that it pushes the bias to other parts of the publication pipeline where it is less observable and harder to mitigate.
Read 5 tweets
31 May
When a machine learning system uses argmax to select outputs from a probability distribution — and most of them do — it's a clue that it might be biased. That's because argmax selects the "most probable" output, which may amplify tiny data biases into perfectly biased outputs.
Here's an exercise (with solution) I developed for my Fairness in ML course with @ang3linawang's help. It uses a toy model to show how bias amplification like the one in the "Men also like shopping" paper can arise through the use of argmax alone! drive.google.com/file/d/1baK_c4…
This graph is the punchline. α and β are parameters that describe correlations in the input and the graphs show correlations in the (multilabel) output. It should be terrifying from a scientific and engineering perspective even if there are no relevant fairness considerations!
Read 10 tweets
23 May
A remarkable thread about messed up corporate power hierarchies. It's worth highlighting something else the story illustrates: the standard way to "solve" online abuse and harassment is to experiment on the victims of abuse and harassment with no consent or transparency.
No surprise here, of course. We all know this is how tech platforms work. But should we take it for granted? Is there no alternative? No way to push back?
It's not A/B testing itself that's the problem. Indeed, in this instance, A/B testing *worked*. It allowed @mathcolorstrees resist a terrible idea by someone vastly more powerful; something that would probably have made Twitter's abuse problem much worse.
Read 5 tweets
20 May
This brilliant, far-too-polite article should be the go-to reference for why "follow the science" is utterly vacuous. The science of aerosol transmission was there all along. It could have stopped covid. But CDC/WHO didn't follow the science. Nor did scientists for the most part.
The party line among scientists and science communicators is that science "self corrects". Indeed it does, but on a glacial timescale with often disastrous policy consequences. Our refusal to admit this further undermines public trust in science.
See also @Zeynep's excoriation of public health agencies, including the comparison of their covid responses with the way 19th century Londoners afraid of "miasma" redirected sewers into the Thames, spreading Cholera even more nytimes.com/2021/05/07/opi…
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(