Jonathan H Chen Profile picture
Mar 30 5 tweets 3 min read Twitter logo Read on Twitter
#ChatGPT #AI performance on *open-ended, free-response,* clinical reasoning exams intended for human medical trainees. #LLMs are passing medical licensing exams, but artificial multiple choice designs do not reflect realistic clinical reasoning.
medrxiv.org/content/10.110…
From describing diagnostic schema, generating differential diagnoses and problem lists, to suggesting and interpreting tests, ChatGPT is already demonstrating the surprising ability to often reach a 70% passing threshold on multiple cases.
For simple recall style questions ("describe the typical symptoms of a patient with heart failure,") it will knock those questions out of the park, while showing wide variability and struggling with deeper analytical questions.
This is all for the originally released ChatGPT(3.5). We've had this in the can for two months, under review at a major medical journal, getting last line level edits by the editor. Then... GPT4 (ChatGPT+) released and we have to redo the whole study.🤦Stay tuned.
@DrEricStrong, Alicia DiGiammarino, Isabel Weng, Preetha Basaviah, @PoonamHosamani, Andre Kumar, Andrew Nevins, John Kugler, Jason Hom @StanfordMed @StanfordBMIR @StanfordHospMed @StanfordDeptMed

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jonathan H Chen

Jonathan H Chen Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jonc101x

Mar 31
Slides to to engage clinical + informatics communities across multiple forums.

For the two people who have not already heard about and tried using #ChatGPT, over 100 million other people already have. The fastest growing internet application in history. #MedAI Image
Good or bad, ready or not, these tools are out there and are being used in all imaginable and some unimaginable ways.
Even LAST quarter, Stanford students are using ChatGPT on the class assignments, including straight up submitting ChatGPT generated answers without any edits. Image
Overly simplified breakdown on how these systems work. Auto-complete on steroids. How to guess the next word you enter? Learning parameters on how often those words have appeared together in prior examples. Image
Read 24 tweets
Oct 21, 2018
Closing the loop on #Diagnostics #Tweetorial with example #AppleWatch #AFib #Screening. From a company website: "Atrial fibrillation is a silent killer. The heart arrhythmia causes more life-threatening strokes than any other chronic condition, and will affect 1 in 4 of us."
"But the sad fact is that atrial fibrillation often goes unnoticed: It is estimated that 40% of those who experience the heart condition are completely unaware of it."
Using #AppleWatch technology and #DeepLearning #AI #ML, a device algorithm can reportedly detect atrial fibrillation with high accuracy (c-statistic 93%).
Read 44 tweets
Oct 5, 2018
#Tweetorial on #Diagnostics and #Screening interpretation.
An otherwise healthy 40 year old woman comes to you after reading on the internet about a terrible disease that one in a thousand women get, and a highly accurate test that can save her life.
The test is over 99% accurate in people with the disease. For those without disease, the test is only wrong 5% of the time.

You order this test and it comes back positive. The woman anxiously asks you, do I have the disease? What is the chance this woman has the disease?
Assuming they weren't immediately fooled by the "test is only wrong 5 % of the time," most I've asked correctly recognize the stats provided are Sensitivity = 99% and Specificity = 95%, and that the objective is to determine the Positive Predictive Value.
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(