Ruben Hassid Profile picture
Jun 7 14 tweets 3 min read Read on X
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

They just memorize patterns really well.

Here's what Apple discovered:

(hint: we're not as close to AGI as the hype suggests) Image
Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games.

They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.

The result ↓
All "reasoning" models hit a complexity wall where they completely collapse to 0% accuracy.

No matter how much computing power you give them, they can't solve harder problems. Image
As problems got harder, these "thinking" models actually started thinking less.

They used fewer tokens and gave up faster, despite having unlimited budget.
Apple researchers even tried giving the models the exact solution algorithm.

Like handing someone step-by-step instructions to bake a cake.

The models still failed at the same complexity points.

They can't even follow directions consistently.
The research revealed three regimes:

• Low complexity: Regular models actually win
• Medium complexity: "Thinking" models show some advantage
• High complexity: Everything breaks down completely

Most problems fall into that third category. Image
Apple discovered that these models are not reasoning at all, but instead doing sophisticated pattern matching that works great until patterns become too complex.

Then they fall apart like a house of cards.
If these models were truly "reasoning," they should get better with more compute and clearer instructions.

Instead, they hit hard walls and start giving up.

Is that intelligence or memorization hitting its limits?
This research suggests we're not as close to AGI as the hype suggests.

Current "reasoning" breakthroughs may be hitting fundamental walls that can't be solved by just adding more data or compute.
Models could handle 100+ moves in Tower of Hanoi puzzles but failed after just 4 moves in River Crossing puzzles.

This suggests they memorized Tower of Hanoi solutions during training but can't actually reason. Image
While AI companies celebrate their models "thinking," Apple basically said "Everyone's celebrating fake reasoning."

The industry is chasing metrics that don't measure actual intelligence.
Apple's researchers used controllable puzzle environments specifically because:

• They avoid data contamination
• They require pure logical reasoning
• They can scale complexity precisely
• They reveal where models actually break

Smart experimental design if you ask me.
What do you think?

Is Apple just "coping" because they've been outpaced in AI developments over the past two years?

Or is Apple correct?

Comment below and I'll respond to all.
If you found this thread valuable:

1. Follow me @RubenHssd for more threads around what's happening around AI and it's implications.

2. RT the first tweet

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ruben Hassid

Ruben Hassid Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RubenHssd

Oct 8
I bet 99% of people who use ChatGPT don't know how to set it up to make it 10x more useful.

They obsess over prompts, but prompts are only 20% of the equation.

Setup is 80%.

In this thread, I'll show you how to (actually) set up your ChatGPT:
→ First, take a look at how much ChatGPT knows about you.

Then, delete everything.

Go to Settings > Personalization > Manage Memories > Delete All Memories.

Most users have 6+ months of random, contradictory memories that actively hurt performance.
→ The 13-Question Framework.

Use this prompt with GPT-5: "Design my ChatGPT digital twin. Ask 13 items one at a time:

Identity, Role, Audience, Outputs, Personality, Voice, Formatting, Values, Projects, Goals, Preferences, Do-Not List, Privacy.

Target <1300 words total." Image
Read 15 tweets
Sep 21
A new Yale paper reveals the brutal reality of the AGI economy:

Half the population could stop working tomorrow and GDP wouldn't budge.

Humans become economically meaningless.

The paper suggests we'll keep our jobs but lose something far more important: Image
We lose our economic purpose.

For centuries, human labor drove progress. We built cities, advanced science, created wealth. Work meant you mattered.

In the AGI economy, that connection breaks.

We keep jobs but lose our role as drivers of growth and progress.
The key insight comes from distinguishing "bottleneck" vs "accessory" work: Image
Read 9 tweets
Aug 27
BREAKING: New Stanford study tracking 25 million US workers finds AI is systematically eliminating entry-level jobs.

Here are 6 disturbing facts from one of the largest AI employment study ever conducted:

(hint: young workers are getting obliterated) Image
Fact 1: Employment for early-career workers (ages 22-25) has declined substantially in occupations most exposed to AI.

Software developers aged 22-25 saw nearly 20% employment decline since late 2022, while older workers in the same occupations continued to grow. Image
Fact 2: Overall employment continues to grow robustly, but employment growth for young workers has been stagnant since late 2022.

In the highest AI-exposed occupations, young workers declined 6% while older workers in those same occupations grew 9%. Image
Read 12 tweets
Aug 25
For the first time, Google has measured how much energy AI really uses in production.

Spoiler: the gap vs. all previous estimates is huge... 🧵 Image
Despite AI transforming healthcare, education, and research, we've been flying blind on its environmental footprint.

Every estimate was based on lab benchmarks, not real-world production systems serving billions of users.

Google decided to measure what actually happens. Image
The results from measuring Gemini in production:

• 0.24 watt-hours per text prompt
• Equivalent to watching TV for 9 seconds
• 5 drops of water consumed
• 0.03 grams of CO2 emissions

Substantially lower than public estimates. Image
Read 14 tweets
Aug 12
Meta just won the world's biggest brain competition by building an AI that can READ YOUR MIND while you watch movies.

1st place out of 263 teams.

This is the most insane paper I've ever read: 🧵

(hint: mind reading is here)
For context, the Algonauts competition challenged teams to build AI that predicts brain activity from videos.

263 teams competed.

Meta crushed it with the biggest 1st-2nd place gap ever.

Let me break down how: Image
TRIBE (TRImodal Brain Encoder) is the first AI trained to predict brain responses across multiple senses simultaneously.

Most brain studies focus on one thing; vision OR hearing OR language.

TRIBE does all three at once, just like your actual brain. Image
Read 17 tweets
Aug 5
China built a computer with 2 billion neurons mimicking a monkey's brain.

If Moore's Law is still valid, we will have human-level brain computers with 86 billion neurons by 2033.

We are closer to duplicating humans.

Thread Image
China's progress is insane:

2020: Darwin Mouse (120 million neurons)
2025: Darwin Monkey (2 billion neurons)
2027: 4 billion neurons
2030: 16 billion neurons
2033: 86 billion neurons ← Human brain level

China went from mouse to monkey in 5 years. Image
What does a human brain computer actually mean?

Every thought, memory, and decision you make could theoretically be replicated in silicon.

We're talking about artificial consciousness that thinks like you do.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(