In the past two years, there has been a deluge of papers published in the AI in Dermatology space. In our @JAMADerm paper with @Dr_Vron and @james_y_zou, we audited the transparency of the datasets and models used. #AI #DermTwitter #MedTwitter jamanetwork.com/journals/jamad…
This work was inspired by @timnitGebru's famous paper "Datasheets for Datasets", which discusses the importance of documenting datasets with information around how they were created, potential biases, and recommended use cases. arxiv.org/abs/1803.09010
This is our motivation: "To be successful, AI algorithms need to be trained and tested on data that represent clinical scenarios encountered in real-world settings. Therefore, a clear understanding of data set characteristics is critical."
What we found; however, was sobering. Unsurprisingly, most of the dermatology data used for AI development was siloed in institutions and not shared publicly. You may note here that public datasets such as ISIC (labeled 1) help generate a lot of research and are really valuable
Previously @AdeAdamson has expressed concerns about bias in AI algorithms from the lack of diverse skin tones represented. We wanted to quantify biases with regards to skin tone diversity in the datasets used for AI development. jamanetwork.com/journals/jamad…
However, we were unable to do this because very few papers even reported skin tones or ethnicities used within the data. I highly suspect that these datasets are not diverse, but there is no way to know.
One of the other major concerns we had was label noise in the datasets. While there are not established gold standards for every disease, we believe that skin cancer should be confirmed by histopathology rather than consensus from looking at an image.
Many papers, including one group that recently announced they were going to release their algorithm directly to patients, did not have histopathological confirmation of malignancies.
We understand that there are issues of patient privacy around releasing datasets. However, adequate descriptions of the datasets are absolutely needed. It's like reading about a clinical trial without knowing any information on patient demographics.
Another way to be more transparent is to share models. This is something @whria78 has done with Model Derm. Most people do not share their code or provide an API for testing their models.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Roxana Daneshjou MD/PhD

Roxana Daneshjou MD/PhD Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(