Latest Twitter Threads by @nikita_mehandru on Thread Reader App

May 24 • 4 tweets • 1 min read

🩺Medical benchmarks measure if LLMs get the correct final diagnosis. True clinical reasoning requires sequential belief updating: does the model revise its beliefs appropriately as new evidence appears?

New preprint: arxiv.org/abs/2505.22919

We introduce ER-Reason, a dataset of 25,174 de-identified clinical notes from 3,437 patients that supports evaluation across all stages of the emergency department (ED) workflow: triage intake, treatment selection, disposition planning, and final diagnosis.

Share this page!

Enter URL or ID to Unroll