Ruben Hassid Profile picture
Jun 7 14 tweets 3 min read Read on X
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

They just memorize patterns really well.

Here's what Apple discovered:

(hint: we're not as close to AGI as the hype suggests) Image
Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games.

They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.

The result ↓
All "reasoning" models hit a complexity wall where they completely collapse to 0% accuracy.

No matter how much computing power you give them, they can't solve harder problems. Image
As problems got harder, these "thinking" models actually started thinking less.

They used fewer tokens and gave up faster, despite having unlimited budget.
Apple researchers even tried giving the models the exact solution algorithm.

Like handing someone step-by-step instructions to bake a cake.

The models still failed at the same complexity points.

They can't even follow directions consistently.
The research revealed three regimes:

• Low complexity: Regular models actually win
• Medium complexity: "Thinking" models show some advantage
• High complexity: Everything breaks down completely

Most problems fall into that third category. Image
Apple discovered that these models are not reasoning at all, but instead doing sophisticated pattern matching that works great until patterns become too complex.

Then they fall apart like a house of cards.
If these models were truly "reasoning," they should get better with more compute and clearer instructions.

Instead, they hit hard walls and start giving up.

Is that intelligence or memorization hitting its limits?
This research suggests we're not as close to AGI as the hype suggests.

Current "reasoning" breakthroughs may be hitting fundamental walls that can't be solved by just adding more data or compute.
Models could handle 100+ moves in Tower of Hanoi puzzles but failed after just 4 moves in River Crossing puzzles.

This suggests they memorized Tower of Hanoi solutions during training but can't actually reason. Image
While AI companies celebrate their models "thinking," Apple basically said "Everyone's celebrating fake reasoning."

The industry is chasing metrics that don't measure actual intelligence.
Apple's researchers used controllable puzzle environments specifically because:

• They avoid data contamination
• They require pure logical reasoning
• They can scale complexity precisely
• They reveal where models actually break

Smart experimental design if you ask me.
What do you think?

Is Apple just "coping" because they've been outpaced in AI developments over the past two years?

Or is Apple correct?

Comment below and I'll respond to all.
If you found this thread valuable:

1. Follow me @RubenHssd for more threads around what's happening around AI and it's implications.

2. RT the first tweet

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ruben Hassid

Ruben Hassid Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @RubenHssd

Aug 27
BREAKING: New Stanford study tracking 25 million US workers finds AI is systematically eliminating entry-level jobs.

Here are 6 disturbing facts from one of the largest AI employment study ever conducted:

(hint: young workers are getting obliterated) Image
Fact 1: Employment for early-career workers (ages 22-25) has declined substantially in occupations most exposed to AI.

Software developers aged 22-25 saw nearly 20% employment decline since late 2022, while older workers in the same occupations continued to grow. Image
Fact 2: Overall employment continues to grow robustly, but employment growth for young workers has been stagnant since late 2022.

In the highest AI-exposed occupations, young workers declined 6% while older workers in those same occupations grew 9%. Image
Read 12 tweets
Aug 25
For the first time, Google has measured how much energy AI really uses in production.

Spoiler: the gap vs. all previous estimates is huge... 🧵 Image
Despite AI transforming healthcare, education, and research, we've been flying blind on its environmental footprint.

Every estimate was based on lab benchmarks, not real-world production systems serving billions of users.

Google decided to measure what actually happens. Image
The results from measuring Gemini in production:

• 0.24 watt-hours per text prompt
• Equivalent to watching TV for 9 seconds
• 5 drops of water consumed
• 0.03 grams of CO2 emissions

Substantially lower than public estimates. Image
Read 14 tweets
Aug 12
Meta just won the world's biggest brain competition by building an AI that can READ YOUR MIND while you watch movies.

1st place out of 263 teams.

This is the most insane paper I've ever read: 🧵

(hint: mind reading is here)
For context, the Algonauts competition challenged teams to build AI that predicts brain activity from videos.

263 teams competed.

Meta crushed it with the biggest 1st-2nd place gap ever.

Let me break down how: Image
TRIBE (TRImodal Brain Encoder) is the first AI trained to predict brain responses across multiple senses simultaneously.

Most brain studies focus on one thing; vision OR hearing OR language.

TRIBE does all three at once, just like your actual brain. Image
Read 17 tweets
Aug 5
China built a computer with 2 billion neurons mimicking a monkey's brain.

If Moore's Law is still valid, we will have human-level brain computers with 86 billion neurons by 2033.

We are closer to duplicating humans.

Thread Image
China's progress is insane:

2020: Darwin Mouse (120 million neurons)
2025: Darwin Monkey (2 billion neurons)
2027: 4 billion neurons
2030: 16 billion neurons
2033: 86 billion neurons ← Human brain level

China went from mouse to monkey in 5 years. Image
What does a human brain computer actually mean?

Every thought, memory, and decision you make could theoretically be replicated in silicon.

We're talking about artificial consciousness that thinks like you do.
Read 11 tweets
Aug 3
NVIDIA just dropped paper exposing a $57 billion AI industry mistake.

While Big Tech keeps pushing expensive LLMs like ChatGPT & Claude...

Small language models handle 70% of AI agent work at 1/30th the cost.

Here's why this changes everything:

(hint: less is more) Image
→ The $57 billion mistake ↓

The AI industry invested massively in centralized LLM infrastructure in 2024.

But the actual market for LLM API services is only $5.6 billion.

That's a 10x gap between investment and revenue no one wants to admit. Image
→ Most companies are betting everything on one operational model that may be fundamentally flawed.

They assume centralized, generalist LLMs will remain the cornerstone without substantial alterations.

The problem? This assumption is about to get very expensive. Image
Read 19 tweets
Jul 30
BREAKING: Scientists just analyzed 740,000 hours of human speech across YouTube and podcasts.

Turns out, ChatGPT is rewiring how humans speak to each other.

Here's what they discovered:

(hint: the first AI to successfully colonize our brains) Image
This shook me up first:

The changes showed up in SPONTANEOUS conversations, not scripts or prepared thoughts.

Random people chatting on podcasts started using ChatGPT's favorite words without realizing it.

The way scientists proved this was ingenious ↓ Image
They fed thousands of human texts to ChatGPT for "editing" and tracked every single change.

ChatGPT uses certain words up to 300x more than humans naturally would.

300 times. Not 3x or 30x, but three hundred.
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(