Jonathan H Chen Profile picture
Mar 31 24 tweets 13 min read Twitter logo Read on Twitter
Slides to to engage clinical + informatics communities across multiple forums.

For the two people who have not already heard about and tried using #ChatGPT, over 100 million other people already have. The fastest growing internet application in history. #MedAI Image
Good or bad, ready or not, these tools are out there and are being used in all imaginable and some unimaginable ways.
Even LAST quarter, Stanford students are using ChatGPT on the class assignments, including straight up submitting ChatGPT generated answers without any edits. Image
Overly simplified breakdown on how these systems work. Auto-complete on steroids. How to guess the next word you enter? Learning parameters on how often those words have appeared together in prior examples. Image
Why stop at looking at people’s search histories?
What happens if we pour in the words for every book ever published, every @Wikipedia article, every @nytimes article, every conversation on @Reddit and @Twitter? Image
The scale of these systems has grown from millions, to billions to 170B parameters in GPT-3 (underlying ChatGPT). They won't publicly disclose, but many suspect GPT-4 has learned over a trillion parameters from text available all over the internet. Image
Bigger doesn't mean better, but surprising Emergent properties occur when the simple concept is given enough examples to learn. Perhaps not THAT surprising, given our intellectual and emotional thought is expressed through the medium of language.
ai.googleblog.com/2022/04/pathwa… Image
It’s not just auto-complete however. The system was further refined by providing human written examples of what “good” answers would look like to different questions. Instruction Fine Tuning - Supervised Learning Image
If you ask ChatGPT the same question 10 times, it generates 10 different answers. Reinforcement Learning with Human Feedback (RLHF) had human workers rate answers, nudging the system towards preferred responses while trying to avoid toxic, biased, or otherwise wrong answers. Image
Examples: Generating Document Drafts - Write an insurance authorization letter for a medication. Draft a letter of recommendation for a student. Boom, done instantaneously. Results aren't super, but they are largely... serviceable. ImageImage
Examples: Summarization and Translation - Draft a patient discharge summary. While you're at it, extract the med list and assign ICD10 diagnosis codes in tabular format. Turn this into patient discharge instructions, understandable at a 5th grade reading level... in Spanish. Image
Example: Manuscript Revisions - Rewrite this abstract into structured form. In less than 200 words. Rewrite it into a the form an R01 fundable Specific Aim and extract out the key Significance and Innovation sections in tabular format. Image
Example: Interactive Coding - These large language models (#LLMs) are remarkably good now with all of these language manipulation tasks. But why stop at human language, why not programming languages, when they've also been able to learn from all the open-source code on @github? Image
#Confabulation! The above code doesn't actually work. Syntactically correct, but some logic errors that required fixing. Major weakness is systems are prone to making things up as they go. Many call this "hallucination," but confabulation more accurately describes the phenomenon. Image
Example: Confabulation - Write an intro card for Jonathan Chen. Many correct things, but I've never been to U Penn and I'm not a Cardiologist. It's just making things up. But worse, how could you possibly know that wasn't true if you didn't already know the answer? Image
Example: Confabulation - Medical question - Provide references to explain how opioids improve mortality in heart failure. It tries to hedge about not being sure that's true, but it dutifully provides references anyway. Go search for these articles. None of them actually exist! Image
This is dangerous! What would you fear more, a medical student who is unsure and sometimes guesses wrong, or another who bluffs their way through rounds, making up facts as they go? The most effective lies are those elegantly hidden within the truth.
grepmed.com/images/13818/c… Image
Before you dismiss, look at where we're going and not just where we are. I had students working on med Q&A systems in 2019, but the performance was too limited, so I stopped paying attention. We're at a moment where out-of-the-box systems can just pass medical licensing exams. Image
Turn your head and any assessment of this actively disruptive technology is already out-of-date. These systems can now handily pass (multiple choice) medical exams. Image
I've spoken before on why this is the wrong question to ask, but it is inevitable, so fine. Who's smarter? Humans or the computer? What does it mean to be a doctor when publicly available, general purpose chat bots pass medical exams and future versions will improve even further? Image
Understanding capabilities, limitations, and implications of emerging technologies atop the peak of inflated expectations will soften the inevitable crash into the trough of disillusionment and move to the slope of enlightenment, using all tools to improve our collective health. Image
Have already reviewed this and related topics for crossover audiences including at at @StanfordBMIR colloquia panel with Preetha Basaviah, Alicia DiGiammarino, Jason Hom, @ronlivs, and @DrEricStrong
Many active threads of work happening, including this one hai.stanford.edu/news/how-well-… @StanfordMed @StanfordHAI
Will discuss more at upcoming @NIDAnews CTN meeting, @ACPIMPhysicians conference with @MdDeepti and @Anacapa17 , @StanfordDeptMed Grand Rounds, and @StanfordMed curriculum working group for medical students. Let me know what more you'd want clinicians to know on the subject.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jonathan H Chen

Jonathan H Chen Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jonc101x

Mar 30
#ChatGPT #AI performance on *open-ended, free-response,* clinical reasoning exams intended for human medical trainees. #LLMs are passing medical licensing exams, but artificial multiple choice designs do not reflect realistic clinical reasoning.
medrxiv.org/content/10.110…
From describing diagnostic schema, generating differential diagnoses and problem lists, to suggesting and interpreting tests, ChatGPT is already demonstrating the surprising ability to often reach a 70% passing threshold on multiple cases.
For simple recall style questions ("describe the typical symptoms of a patient with heart failure,") it will knock those questions out of the park, while showing wide variability and struggling with deeper analytical questions.
Read 5 tweets
Oct 21, 2018
Closing the loop on #Diagnostics #Tweetorial with example #AppleWatch #AFib #Screening. From a company website: "Atrial fibrillation is a silent killer. The heart arrhythmia causes more life-threatening strokes than any other chronic condition, and will affect 1 in 4 of us."
"But the sad fact is that atrial fibrillation often goes unnoticed: It is estimated that 40% of those who experience the heart condition are completely unaware of it."
Using #AppleWatch technology and #DeepLearning #AI #ML, a device algorithm can reportedly detect atrial fibrillation with high accuracy (c-statistic 93%).
Read 44 tweets
Oct 5, 2018
#Tweetorial on #Diagnostics and #Screening interpretation.
An otherwise healthy 40 year old woman comes to you after reading on the internet about a terrible disease that one in a thousand women get, and a highly accurate test that can save her life.
The test is over 99% accurate in people with the disease. For those without disease, the test is only wrong 5% of the time.

You order this test and it comes back positive. The woman anxiously asks you, do I have the disease? What is the chance this woman has the disease?
Assuming they weren't immediately fooled by the "test is only wrong 5 % of the time," most I've asked correctly recognize the stats provided are Sensitivity = 99% and Specificity = 95%, and that the objective is to determine the Positive Predictive Value.
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(