Alan Karthikesalingam Profile picture
May 17 6 tweets 7 min read Twitter logo Read on Twitter
So happy to share #MedPaLM2 - our team's evolution of Med-PaLM. A new state of art for medical question-answering!

Med-PaLM 2 scores 86.5% on MedQA-USMLE, exceeding Med-PaLM's score by >19% 🤯, & 81.8% on PubMedQA...

More here: arxiv.org/pdf/2305.09617… Image
We believe in rigorous, careful evaluation. Physicians even preferred #MedPaLM2's long-form answers to answers from other real 🇮🇳🇺🇸🇬🇧 physicians along 8/9 axes of quality including medical accuracy (consensus w/medical opinion) and reasoning, with less likelihood of harm Image
MedPaLM-2's performance was superior to Med-PaLM far beyond exam performance. To highlight the real-world importance of nuanced evaluation we introduce a new dataset of "adversarial" questions designed specifically to probe LLM weaknesses including #HealthEquity Image
Lay raters also consistently find MedPaLM-2 more helpful, and that it directly addresses the intent behind a medical question: Image
I can't believe I get to say this but you can see a summary by @sundarpichai with a sneak peek at where our research is heading next! Also co-senior authors' feeds at @AziziShekoofeh @vivnat and first authors @taotu831 @thekaransinghal @Mysiak
👀 youtube.com/clip/Ugkxb7W_k…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alan Karthikesalingam

Alan Karthikesalingam Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alan_karthi

Dec 27, 2022
💡New paper - Large Language Models Encode Clinical Knowledge💡 Our work @GoogleHealth @GoogleAI @DeepMind advances state-of-art in 7 medical question-answering tasks - including achieving 67% on MedQA (USMLE qs) improving prior work by >17%

arxiv.org/abs/2212.13138

1/n
Careful evaluation is key for LLMs in safety-critical settings. We pilot a framework for clinician and layperson evaluation of LLMs’ outputs. Deeper human inspection reveals gaps in comprehension + reasoning (2/n)
We approach these with instruction prompting-tuning. We show that this helps to align a model "MedPaLM" better to the medical domain - with smaller gaps in reasoning, comprehension, safety and helpfulness

(3/n)
Read 5 tweets
Nov 5, 2021
Our research @GoogleHealth @GoogleAI @DeepMind published at Medical Image Analysis goo.gle/31kUam7.
Wise doctors know when they don’t know- medical AI should too. In dermatology this is critical, as many rare skin conditions occur too infrequently for AI to learn (1/n)
For AI researchers, detecting conditions a model has not seen in training is called “out-of-distribution (OOD) detection”. Doing this in medical AI is significantly harder than most computer vision work, because the differences between rare + common diseases can be subtle
Using our large-scale pre-training advances and a novel "HOD" loss, we achieved an AUC of 0.83 on a new benchmark for this "near-out-of-distribution" detection challenge - to evaluate how well a dermatology AI system recognises a previously-unseen condition.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(