Jonathan H Chen MD PhD Profile picture
Physician Data Scientist - Stanford Center for Biomedical Informatics Research + Division of Hospital Medicine + Clinical Excellence Research Center

Mar 31, 2023, 24 tweets

Slides to to engage clinical + informatics communities across multiple forums.

For the two people who have not already heard about and tried using #ChatGPT, over 100 million other people already have. The fastest growing internet application in history. #MedAI

Good or bad, ready or not, these tools are out there and are being used in all imaginable and some unimaginable ways.
Even LAST quarter, Stanford students are using ChatGPT on the class assignments, including straight up submitting ChatGPT generated answers without any edits.

Overly simplified breakdown on how these systems work. Auto-complete on steroids. How to guess the next word you enter? Learning parameters on how often those words have appeared together in prior examples.

Why stop at looking at people’s search histories?
What happens if we pour in the words for every book ever published, every @Wikipedia article, every @nytimes article, every conversation on @Reddit and @Twitter?

The scale of these systems has grown from millions, to billions to 170B parameters in GPT-3 (underlying ChatGPT). They won't publicly disclose, but many suspect GPT-4 has learned over a trillion parameters from text available all over the internet.

Bigger doesn't mean better, but surprising Emergent properties occur when the simple concept is given enough examples to learn. Perhaps not THAT surprising, given our intellectual and emotional thought is expressed through the medium of language.
ai.googleblog.com/2022/04/pathwa…

It’s not just auto-complete however. The system was further refined by providing human written examples of what “good” answers would look like to different questions. Instruction Fine Tuning - Supervised Learning

If you ask ChatGPT the same question 10 times, it generates 10 different answers. Reinforcement Learning with Human Feedback (RLHF) had human workers rate answers, nudging the system towards preferred responses while trying to avoid toxic, biased, or otherwise wrong answers.

Examples: Generating Document Drafts - Write an insurance authorization letter for a medication. Draft a letter of recommendation for a student. Boom, done instantaneously. Results aren't super, but they are largely... serviceable.

Examples: Summarization and Translation - Draft a patient discharge summary. While you're at it, extract the med list and assign ICD10 diagnosis codes in tabular format. Turn this into patient discharge instructions, understandable at a 5th grade reading level... in Spanish.

Example: Manuscript Revisions - Rewrite this abstract into structured form. In less than 200 words. Rewrite it into a the form an R01 fundable Specific Aim and extract out the key Significance and Innovation sections in tabular format.

Example: Interactive Coding - These large language models (#LLMs) are remarkably good now with all of these language manipulation tasks. But why stop at human language, why not programming languages, when they've also been able to learn from all the open-source code on @github?

#Confabulation! The above code doesn't actually work. Syntactically correct, but some logic errors that required fixing. Major weakness is systems are prone to making things up as they go. Many call this "hallucination," but confabulation more accurately describes the phenomenon.

Example: Confabulation - Write an intro card for Jonathan Chen. Many correct things, but I've never been to U Penn and I'm not a Cardiologist. It's just making things up. But worse, how could you possibly know that wasn't true if you didn't already know the answer?

Example: Confabulation - Medical question - Provide references to explain how opioids improve mortality in heart failure. It tries to hedge about not being sure that's true, but it dutifully provides references anyway. Go search for these articles. None of them actually exist!

This is dangerous! What would you fear more, a medical student who is unsure and sometimes guesses wrong, or another who bluffs their way through rounds, making up facts as they go? The most effective lies are those elegantly hidden within the truth.
grepmed.com/images/13818/c…

Before you dismiss, look at where we're going and not just where we are. I had students working on med Q&A systems in 2019, but the performance was too limited, so I stopped paying attention. We're at a moment where out-of-the-box systems can just pass medical licensing exams.

Turn your head and any assessment of this actively disruptive technology is already out-of-date. These systems can now handily pass (multiple choice) medical exams.

I've spoken before on why this is the wrong question to ask, but it is inevitable, so fine. Who's smarter? Humans or the computer? What does it mean to be a doctor when publicly available, general purpose chat bots pass medical exams and future versions will improve even further?

Understanding capabilities, limitations, and implications of emerging technologies atop the peak of inflated expectations will soften the inevitable crash into the trough of disillusionment and move to the slope of enlightenment, using all tools to improve our collective health.

Have already reviewed this and related topics for crossover audiences including at at @StanfordBMIR colloquia panel with Preetha Basaviah, Alicia DiGiammarino, Jason Hom, @ronlivs, and @DrEricStrong

Many active threads of work happening, including this one hai.stanford.edu/news/how-well-… @StanfordMed @StanfordHAI

Will discuss more at upcoming @NIDAnews CTN meeting, @ACPIMPhysicians conference with @MdDeepti and @Anacapa17 , @StanfordDeptMed Grand Rounds, and @StanfordMed curriculum working group for medical students. Let me know what more you'd want clinicians to know on the subject.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling