Deedy Profile picture
Dec 17 5 tweets 2 min read Read on X
o1-preview is far superior to doctors on reasoning tasks and it's not even close, according to OpenAI's latest paper.

AI does ~80% vs ~30% on the 143 hard NEJM CPC diagnoses.

It's dangerous now to trust your doctor and NOT consult an AI model.

Here are some actual tasks:

1/5 Image
Here's an example case looking at phosphate wasting and elevated FGF23, then proceeded to imaging to localize a potential tumor.

o1-preview suggested testing plan takes a broader, more methodical approach, systematically ruling out other causes of hypophosphatemia.

2/5 Image
For persistent, unexplained hyperammonemia, o1-preview recommends a prioritized expansion of tests—from basic immunoglobulins and electrolytes to advanced imaging, breath tests for SIBO and specialized GI biopsies—ensuring more common causes are checked first.

3/5 Image
I have all the respect in the world for doctors, but in many cases their job is basic reasoning over a tremendously large domain-specific knowledge base.

Fortunately, this is exactly what LLMs are really good for.

This means more high quality healthcare for everyone.

4/5

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Deedy

Deedy Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @deedydas

Dec 16
Want to design better AI agents? Take notes from code writing systems.

Techniques include
— Multi-agent
— Tool choice
— Underlying model
— Diff format
— Innovative Signals
— Code retrieval + knowledge graphs
— LSP
— Fault localization

Let's dive deeper with real examples:

1/10
Multi-agent

Use agents with different roles / prompts that have access to different tools and can hand off to another agent.

Some roles used in coding: searcher, planner, reproducer, coder, tester, editor

2/10
Tool choice

Figuring out which agent has access to which tools and designing their inputs / outputs effectively.

Tools: knowledge graph, search, bash commands, edit, run test, bash

3/10 Image
Read 10 tweets
Dec 10
As cool as the new Sora is, gymnastics is still very much the Turing test for AI video.

1/4
Attempt 2.

2/4
Attempt 3

3/4
Read 5 tweets
Dec 10
HUGE Immigration News!

We have the first EVER look at H-1B lottery data. Did you also suspect the lottery wasn't truly random? They're not.

Certain companies like Tiktok and Bytedance have 50% higher odds than average.

I broke it down by nationality, company and age...

1/5 Image
Image
By Age.

In 2024, there were 350,084 applications and 85,304 were selected, a 24.4% acceptance rate.

The process seems to have rampant ageism, with only ages 26-32 having above average acceptance rates.

A 26yo has 50% higher odds than a 36yo+!

2/5 Image
By Country.

Inexplicably, China and Taiwan are the best performing nationalities followed by Iran and Bangladesh (!?). India is below average.

Bangladeshis have 25% more chance of getting an H-1B than an Indian.

3/5 Image
Read 6 tweets
Dec 4
A small company in the <1M city of Niigata, Japan has a monopoly on the equipment that makes every single modern iPhone and TV display on the planet.

Here's the story of Tokki, the most important company you've never heard of...

1/8 Image
Tokki makes just ~10 ELVESS machines a year.

Each one is a clean room within a clean room, stretching longer than an Olympic swimming pool, and can costs $ 100M+. They're the only ones who can do it.

2/8
These machines deposit layers of organic materials 1/2000th the width of a human hair. One speck of dust ruins everything - that's why they operate in vacuum chambers cleaner than an operating room.

3/8
Read 10 tweets
Dec 3
The middle manager is the biggest culprit of the "quiet quitting" SWE epidemic.

They have 0 incentive to fire. The entire job is bargaining for more headcount so they can get promoted.

They'll say "we are understaffed, we need more people" no matter how little they do.

1/4
If you have multiple levels of this, its often dysfunctional down the entire chain.

The non technical middle manager are the worst culprits. ICs can (and will) swindle them endlessly into their infinite timelines. I've never worked with a good one.

2/4
A competent middle manager should be able to do the job of all their reports and say "if you think that takes 2 months, you need to find another job. It shouldn't take more than 2 weeks." (and be right)

Their managers need to keep them accountable for each report.

3/4
Read 4 tweets
Dec 2
The youth of India spends money they don't have to play status games.

Multiple people making <₹50k ($600)/mo in their 20s are buying
— iPhone & other Apple products (on interest)
— Coldplay / Dua Lipa concerts
— trips to Thailand / Vietnam / Goa

They save little to no money.
I’ve been in India and multiple people have told me they’re essentially broke after multiple years of working.

Income asymmetry amongst similar social groups is high which can force this “suddenly my college buddy makes 90LPA but I’m at 12”

The peer pressure to “fit in” is high
~10M people in India made 20+LPA ($25k/yr)

~10M iPhones were sold in India in just 2023. 70% of them were on EMI (interest). Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(