We're taking a big step towards medical superintelligence. AI models have aced multiple choice medical exams – but real patients don’t come with ABC answer options. Now MAI-DxO can solve some of the world’s toughest open-ended cases with higher accuracy and lower costs.
While AI has achieved near-perfect scores on the US Medical Licensing Exam, we set a higher benchmark: 304 cases from the New England Journal of Medicine. These are some of the toughest and most diagnostically complex cases a physician can face.
Microsoft AI built MAI-DxO to simulate a virtual panel of physicians with different approaches collaborating to find a diagnosis on each case. They also included the ability to set a budget to avoid infinite testing (higher costs, longer wait times, etc.).
What they found:
- MAI-DxO boosted performance of every model tested on those 304 cases
- 85.5% solve rate vs. 20% by a group of physicians
- Its higher accuracy came with LOWER overall testing costs than lone LLMs or physicians
MAI-DxO in action, tackling one of those complex cases:
This research is just the first step on a long, exciting journey. We’re excited to keep testing and learning with our healthcare partners in pursuit of better, more accessible care for people everywhere. More on the blog today: microsoft.ai/new/the-path-t…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
ICYMI: we made a lot of Copilot announcements this morning! Some of the highlights of what’s rolling out today + in the coming weeks 🧵
Copilot can remember you now, from the name of your dog to whether you feel most productive first thing in the morning. You’re always in control, and can delete any conversation at any time. This is the start of really personalizing your Copilot.
We’re also experimenting with personalization through Copilot’s appearance. Maybe you want yours to reflect your music taste or a love of Clippy?
You can't just be right, you have to know you're right. Good advice for LLMs, according to new Johns Hopkins research. Sometimes no answer is better than a wrong one – life or death choices in medicine, for example, or big financial decisions. 🧵
We know more compute results in higher accuracy, but are the models more confident those answers ARE accurate too? And how do we teach them when to say “I don’t know”? That’s what the research team wanted to find out.
In the study, they measured how different combinations of compute budget and confidence thresholds (being at least 50% sure of the answer, etc.) affected models’ performance on a benchmark math test.
Today we’ve made Think Deeper free and available for all users of Copilot.
This now gives everyone access to OpenAI’s world class o1 reasoning model in Copilot, everywhere at no cost.
I urge you to give it a try. It’s truly magical. Think Deeper helps you:
Get in-depth advice on how to manage a career change, with detailed breakdowns of educational milestones and options, resources on where to look for roles, strategies for getting in the door and industry trends you absolutely need to know.
Plan that epic project. Brain dump everything into Think Deeper and watch it churn through it all and spit out a clean, crisp step by step guide to making it happen. I've tried this on a few things (fitness routine, big launch coming up) and it’s genuinely so helpful.
After Ethan's post, I went on a deep dive into this study! I could go on and on about the results but if I had to boil it down to my biggest takeaways...🧵
The setup: For 6 weeks, students used Copilot in their computer lab 2x/week, guided by teachers on selected topics and grammar/writing tasks.
The results: A pen and paper test showed their scores improving .3 standard deviations, the equivalent of almost 2 years of learning.
• Many of these students had never even used a computer before. They spent the beginning of the program figuring out how to navigate a PC, setting up user accounts, being taught how to prompt. Makes the learning curve even more remarkable.
Very excited to announce my new book: The Coming Wave
Today's AI is only the start. A wave of emerging technologies will help address global challenges & create vast wealth. But they will also create upheaval on a once unimaginable scale.
These are ideas I've been thinking about for over a decade. This is my attempt to understand how and why technology naturally proliferates, and what society needs to do to remain in control.
I argue that “containing” this coming wave is the defining challenge of the century.
As the public conversation around AI has exploded, it's more important than ever for those of us driving development to critically reflect on what’s unfolding.
I hope it’s useful. It intended to provoke debate and encourage everyone to develop new strategies for containment.