Nikita Mehandru Profile picture
Ph.D. Student @UCBerkeley | Prev. Research Intern @MSFTResearch Health Futures
May 24 4 tweets 1 min read
🩺Medical benchmarks measure if LLMs get the correct final diagnosis. True clinical reasoning requires sequential belief updating: does the model revise its beliefs appropriately as new evidence appears?

New preprint: arxiv.org/abs/2505.22919Image We introduce ER-Reason, a dataset of 25,174 de-identified clinical notes from 3,437 patients that supports evaluation across all stages of the emergency department (ED) workflow: triage intake, treatment selection, disposition planning, and final diagnosis.