Latest Twitter Threads by @thekaransinghal on Thread Reader App

Apr 22 • 12 tweets • 3 min read

Today we’re introducing two big steps for health at OpenAI:

- ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work
- HealthBench Professional, a new benchmark to evaluate real clinician chat tasks

We’re excited about what this can unlock for care. ❤️

Clinicians are under enormous pressure: more visits, more documentation, more evidence, and less time for patients.

They’re turning to AI: today, millions of clinicians use ChatGPT to support care delivery weekly, and usage has more than doubled in the last year.

Jul 22, 2025 • 14 tweets • 4 min read

📣 Excited to share our real-world study of an LLM clinical copilot, a collab between @OpenAI and @PendaHealth.

Across 39,849 live patient visits, clinicians with AI had a 16% relative reduction in diagnostic errors and a 13% reduction in treatment errors vs. those without. 🧵

As AI capabilities in health advance, the gap between capabilities and adoption grows. To accelerate AI’s impact on human health, we need real-world studies that capture the challenges of implementation+deployment, creating evidence and shared playbooks.

https://x.com/thekaransinghal/status/1947116421567398383

Jul 21, 2025 • 4 tweets • 2 min read

Sharing a talk I gave last year: “Levels of Clinical Evaluation for LLMs”

We've seen a lot of interest in evaluating LLMs for health. We recently put out HealthBench and saw performance double between GPT-4o and o3, but evaluations != implementation. 🧵 karansinghal.com/notes/levels-o…

Evaluations remain essential–both for the healthcare ecosystem and model developers. Despite this, the community has traditionally put too much focus on narrow, unrealistic benchmarks that measure raw model knowledge but lack realistic data and use cases.

Share this page!

Enter URL or ID to Unroll