Alan Karthikesalingam's Threads

May 17, 2023 • 6 tweets • 7 min read

So happy to share #MedPaLM2 - our team's evolution of Med-PaLM. A new state of art for medical question-answering!

Med-PaLM 2 scores 86.5% on MedQA-USMLE, exceeding Med-PaLM's score by >19% 🤯, & 81.8% on PubMedQA...

More here: arxiv.org/pdf/2305.09617…

We believe in rigorous, careful evaluation. Physicians even preferred #MedPaLM2's long-form answers to answers from other real 🇮🇳🇺🇸🇬🇧 physicians along 8/9 axes of quality including medical accuracy (consensus w/medical opinion) and reasoning, with less likelihood of harm

Dec 27, 2022 • 5 tweets • 5 min read

💡New paper - Large Language Models Encode Clinical Knowledge💡 Our work @GoogleHealth @GoogleAI @DeepMind advances state-of-art in 7 medical question-answering tasks - including achieving 67% on MedQA (USMLE qs) improving prior work by >17%

arxiv.org/abs/2212.13138

1/n

https://twitter.com/vivnat/status/1607609299894947841

Careful evaluation is key for LLMs in safety-critical settings. We pilot a framework for clinician and layperson evaluation of LLMs’ outputs. Deeper human inspection reveals gaps in comprehension + reasoning (2/n)

Nov 5, 2021 • 6 tweets • 5 min read

Our research @GoogleHealth @GoogleAI @DeepMind published at Medical Image Analysis goo.gle/31kUam7.
Wise doctors know when they don’t know- medical AI should too. In dermatology this is critical, as many rare skin conditions occur too infrequently for AI to learn (1/n)

https://twitter.com/GoogleHealth/status/1456660083102916614

For AI researchers, detecting conditions a model has not seen in training is called “out-of-distribution (OOD) detection”. Doing this in medical AI is significantly harder than most computer vision work, because the differences between rare + common diseases can be subtle

Share this page!

Enter URL or ID to Unroll