Mustafa Suleyman Profile picture
Jun 30 6 tweets 2 min read Read on X
We're taking a big step towards medical superintelligence. AI models have aced multiple choice medical exams – but real patients don’t come with ABC answer options. Now MAI-DxO can solve some of the world’s toughest open-ended cases with higher accuracy and lower costs. Image
While AI has achieved near-perfect scores on the US Medical Licensing Exam, we set a higher benchmark: 304 cases from the New England Journal of Medicine. These are some of the toughest and most diagnostically complex cases a physician can face.
Microsoft AI built MAI-DxO to simulate a virtual panel of physicians with different approaches collaborating to find a diagnosis on each case. They also included the ability to set a budget to avoid infinite testing (higher costs, longer wait times, etc.). Image
What they found:
- MAI-DxO boosted performance of every model tested on those 304 cases
- 85.5% solve rate vs. 20% by a group of physicians
- Its higher accuracy came with LOWER overall testing costs than lone LLMs or physicians Image
Image
MAI-DxO in action, tackling one of those complex cases:
This research is just the first step on a long, exciting journey. We’re excited to keep testing and learning with our healthcare partners in pursuit of better, more accessible care for people everywhere. More on the blog today: microsoft.ai/new/the-path-t…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Mustafa Suleyman

Mustafa Suleyman Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mustafasuleyman

Apr 5
ICYMI: we made a lot of Copilot announcements this morning! Some of the highlights of what’s rolling out today + in the coming weeks 🧵
Copilot can remember you now, from the name of your dog to whether you feel most productive first thing in the morning. You’re always in control, and can delete any conversation at any time. This is the start of really personalizing your Copilot.
We’re also experimenting with personalization through Copilot’s appearance. Maybe you want yours to reflect your music taste or a love of Clippy?
Read 10 tweets
Mar 19
You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research. Sometimes no answer is better than a wrong one – life or death choices in medicine, for example, or big financial decisions. 🧵 a 3D graph with the X axis of compute budget, Y axis of accuracy, and Z axis of confidence threshold. The chart shows that accuracy increases with higher compute and confidence thresholds, though the trade-off tends to be fewer questions answered overall.
We know more compute results in higher accuracy, but are the models more confident those answers ARE accurate too? And how do we teach them when to say “I don’t know”? That’s what the research team wanted to find out.
In the study, they measured how different combinations of compute budget and confidence thresholds (being at least 50% sure of the answer, etc.) affected models’ performance on a benchmark math test.
Read 9 tweets
Jan 29
Today we’ve made Think Deeper free and available for all users of Copilot.

This now gives everyone access to OpenAI’s world class o1 reasoning model in Copilot, everywhere at no cost.

I urge you to give it a try. It’s truly magical. Think Deeper helps you:
Get in-depth advice on how to manage a career change, with detailed breakdowns of educational milestones and options, resources on where to look for roles, strategies for getting in the door and industry trends you absolutely need to know.
Plan that epic project. Brain dump everything into Think Deeper and watch it churn through it all and spit out a clean, crisp step by step guide to making it happen. I've tried this on a few things (fitness routine, big launch coming up) and it’s genuinely so helpful.
Read 6 tweets
Jan 18
After Ethan's post, I went on a deep dive into this study! I could go on and on about the results but if I had to boil it down to my biggest takeaways...🧵
The setup: For 6 weeks, students used Copilot in their computer lab 2x/week, guided by teachers on selected topics and grammar/writing tasks.
The results: A pen and paper test showed their scores improving .3 standard deviations, the equivalent of almost 2 years of learning.
• Many of these students had never even used a computer before. They spent the beginning of the program figuring out how to navigate a PC, setting up user accounts, being taught how to prompt. Makes the learning curve even more remarkable.
Read 9 tweets
Jun 8, 2023
Very excited to announce my new book: The Coming Wave

Today's AI is only the start. A wave of emerging technologies will help address global challenges & create vast wealth. But they will also create upheaval on a once unimaginable scale.

THE-COMING-WAVE.COM Image
These are ideas I've been thinking about for over a decade. This is my attempt to understand how and why technology naturally proliferates, and what society needs to do to remain in control.

I argue that “containing” this coming wave is the defining challenge of the century.
As the public conversation around AI has exploded, it's more important than ever for those of us driving development to critically reflect on what’s unfolding.

I hope it’s useful. It intended to provoke debate and encourage everyone to develop new strategies for containment.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(