Dr. Dominic Ng Profile picture
Jun 30 β€’ 13 tweets β€’ 4 min read β€’ Read on X
Microsoft claims their new AI framework diagnoses 4x better than doctors.

I'm a medical doctor and I actually read the paper. Here's my perspective on why this is both impressive AND misleading ... 🧡 Image
What did they create? Two key innovations:
1. SDBench: A testing environment using 304 real medical mysteries from NEJM where AI starts with just "29yo woman with sore throat" and must decide what to ask/test next

2. MAI-DxO: An AI system that simulates 5 doctors working together as a teamImage
How did they test the AI/Doctors?
They took 304 real cases from NEJM and turned them into an interactive game.

The setup:
Step 1: You (human doctor or AI) get a tiny intro like: "52-year-old man with fever and breathing problems." That's it. No test results, no detailed history - just like a patient walking into the ER.

Step 2: There's a "Gatekeeper" (another AI) that has the full case file but won't tell you anything unless you specifically ask.

Step 3: You can do three things:
1. Ask questions ("Any recent travel?" "Is there chest pain?")
2. Order tests ("CBC" "Chest X-ray" "CT scan")
3. Make your final diagnosis ("This is pneumonia")

Step 4: The Gatekeeper then answers the question. BUT it only reveals what you ask for. If you don't think to ask about travel history, you won't find out the patient just returned from a cave expedition (real case - histoplasmosis).

Step 5: Every test costs money (real US hospital prices). Every round of questions = $300 office visit.
MAI-DxO isn't a new model but instead a framework built on top of existing LLM's (ChatGPT, Claude, Gemini).

How does this framework work?
It asks the LLM to simulate a virtual panel of 5 specialised AI doctors:
Dr. Hypothesis (tracks diagnoses)
Dr. Test-Chooser (selects optimal tests)
Dr. Challenger (plays devil's advocate)
Dr. Stewardship (manages costs)
Dr. Checklist (quality control)

Then argue it out between themselves as to the best path forward.Image
The results?
πŸ“Š Accuracy:
Doctors: 20% (ouch)
Standard AI: 30-79%
MAI-DxO: 80-85.5%

πŸ’° Cost per case:
Doctors: $2,963
Standard AI (o3): $7,850
MAI-DxO: $2,397

On paper the AI was 4x more accurate AND cheaper.....
But there's five issues I see:
1. They used ZERO healthy patients
95% of sore throats are viral and this AI was only tested on incredibly rare diagnostic cases.

We don't know if it will order biopsies on every patient with a sore throat "just to rule out rhabdomyosarcoma."
2. "Cost-effective" ignores the human toll
Their costs only count lab fees, not:
- 2 weeks of anxiety waiting for biopsy results
- Radiation from "precautionary" CT scans (cancer risk!)
- Complications from unnecessary procedures
- Time off work
- Psychological trauma of false cancer scares
3. The physician comparison was rigged
Docs were banned from:
❌ Googling symptoms
❌ Consulting colleagues
❌ Using UpToDate/medical databases
❌ Calling specialists

That's not how we practice!!
It's like testing a chef who can't use recipes or taste their food.
4. The "Retrospective Oracle" Problem
These cases were already SOLVED and published.

Real medicine involves genuine uncertainty - sometimes the diagnosis is never found. Does the AI know when to stop investigating?
5. No "When to Stop" Testing
Great doctors know when NOT to test. This AI was never evaluated on:

"This headache is just stress"
"Let's wait and see"
"More tests will cause more harm than good"

The benchmark rewards finding zebras, not recognising horses.
Don't get me wrong - this tech is amazing and I have no doubt I might be getting replaced in the not so near future.

But we need:
βœ“ Testing on actual patient populations (mostly healthy!)
βœ“ Measuring overdiagnosis harm
βœ“ Real-world physician comparisons
Final thought: We don't need AI that can diagnose every rare disease. We need AI that knows when to diagnose and when to reassure. That's the real art of medicine.

But what do you think?
If you liked this post please follow me @DrDominicNg and retweet.

It takes me some time to read and write these posts so I'd love to get more people's thoughts on it!

I've also just started a new newsletter on neuroscience:
brainhealthdecoded.substack.com/subscribe

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Dr. Dominic Ng

Dr. Dominic Ng Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @DrDominicNg

Jun 28
CRISPR just scored its biggest win yet against Huntington's.

The secret? A delivery system called RIDE that sneaks into neurons, makes its edit, then vanishes in 72 hours.

Here's what happened πŸ§΅πŸ‘‡ Image
First - what is Huntington's?

Picture DNA as a sentence. In Huntington's, one word gets repeated too many times: CAG-CAG-CAG-CAG... This repetition builds toxic proteins that kill brain cells. Image
That leads to uncontrolled movements, emotional instability, progressive dementia.

Current medicine can only dull the symptoms - nothing stops the underlying problem.
Read 12 tweets
Jun 25
DeepMind just dropped a 106-page paper unveiling AlphaGenome.

This single model could completely redefine how we discover disease-causing mutations and drug targets.

This is massive. 🧡 Image
The challenge?
>98 % of human variants lie in non-coding DNA which they exert INDIRECT regulatory effects on the proteins your body makes Image
The problem with current tools is that:
1. Models either analyse long DNA sequences to see distant interactions but with a blurry, low-resolution output, or they could provide sharp, single-base detail on a tiny snippet, missing the broader context.
Read 13 tweets
Jun 9
I'm a doctor & neuroscientist - here's my hot take:

You already know what to do for your health.
Your real problem? Doing it.

So here’s my 10 hacks to create lasting habits:
1. Start tiny: 2 minute walk, 2 minute meditation. Your brain loves easy wins.
2. Link habits together: "After I make coffee, I'll review Spanish flashcards."

Attach new habits to things you already do. Studies show this doubles success rates.
3. Fix your space: Put good stuff in sight, hide temptations.

Your environment shapes behaviour more than willpower.
Read 11 tweets
Jun 8
I'm a doctor and neuroscientist.

The development i'm most surprised by: strategically limiting oxygen may be therapeutic.

A new @ScienceTM review showed that low levels of oxygen may be able to treat mitochondrial diseases and enhance stroke recovery.

Let me explain why πŸ‘‡ Image
In a stunning experiment, mice with a fatal childhood brain disease (Leigh Syndrome) lived 4x longer just by breathing less oxygen.

Instead of dying at 2.5 months, they lived nearly a year - and brain scans showed damage actually reversing.
Why would less oxygen help? In some diseases, cells can't use oxygen properly - like a car engine that can't burn all its fuel.

The leftover "unused oxygen" actually becomes toxic to cells. Reducing oxygen intake prevents this damage. Image
Read 10 tweets
Jun 2
While everyone was obsessing over CRISPR, a small team just quietly published a paper in Science solving genetic medicine's biggest problem.

They created a system that can fix thousands of different mutations at once. Here's how they did it 🧡 Image
Current gene editing 101: You inherit a disease-causing mutation β†’ CRISPR-Cas9 targets that exact DNA sequence β†’ cleaves both strands β†’ cell repairs it with correct template. Already curing sickle cell. Already reversing genetic blindness. Already changing medicine.
But CRISPR has a massive blind spot: it needs to know EXACTLY which mutation to fix. Problem is, most genetic diseases aren't that simple. Cystic fibrosis? Over 2,000 different mutations. Same disease, different typos in every patient. Now what?
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(