Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Ruben Hassid

@RubenHssd

Jun 7 • 14 tweets • 3 min read • Read on X

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

They just memorize patterns really well.

Here's what Apple discovered:

(hint: we're not as close to AGI as the hype suggests)

Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games.

They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.

The result ↓

All "reasoning" models hit a complexity wall where they completely collapse to 0% accuracy.

No matter how much computing power you give them, they can't solve harder problems.

As problems got harder, these "thinking" models actually started thinking less.

They used fewer tokens and gave up faster, despite having unlimited budget.

Apple researchers even tried giving the models the exact solution algorithm.

Like handing someone step-by-step instructions to bake a cake.

The models still failed at the same complexity points.

They can't even follow directions consistently.

The research revealed three regimes:

• Low complexity: Regular models actually win
• Medium complexity: "Thinking" models show some advantage
• High complexity: Everything breaks down completely

Most problems fall into that third category.

Apple discovered that these models are not reasoning at all, but instead doing sophisticated pattern matching that works great until patterns become too complex.

Then they fall apart like a house of cards.

If these models were truly "reasoning," they should get better with more compute and clearer instructions.

Instead, they hit hard walls and start giving up.

Is that intelligence or memorization hitting its limits?

This research suggests we're not as close to AGI as the hype suggests.

Current "reasoning" breakthroughs may be hitting fundamental walls that can't be solved by just adding more data or compute.

Models could handle 100+ moves in Tower of Hanoi puzzles but failed after just 4 moves in River Crossing puzzles.

This suggests they memorized Tower of Hanoi solutions during training but can't actually reason.

While AI companies celebrate their models "thinking," Apple basically said "Everyone's celebrating fake reasoning."

The industry is chasing metrics that don't measure actual intelligence.

Apple's researchers used controllable puzzle environments specifically because:

• They avoid data contamination
• They require pure logical reasoning
• They can scale complexity precisely
• They reveal where models actually break

Smart experimental design if you ask me.

What do you think?

Is Apple just "coping" because they've been outpaced in AI developments over the past two years?

Or is Apple correct?

Comment below and I'll respond to all.

https://twitter.com/1269541526/status/1931389580105925115

If you found this thread valuable:

1. Follow me @RubenHssd for more threads around what's happening around AI and it's implications.

2. RT the first tweet

https://twitter.com/1269541526/status/1931389580105925115

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @RubenHssd

Ruben Hassid

@RubenHssd

Oct 8

I bet 99% of people who use ChatGPT don't know how to set it up to make it 10x more useful.

They obsess over prompts, but prompts are only 20% of the equation.

Setup is 80%.

In this thread, I'll show you how to (actually) set up your ChatGPT:

→ First, take a look at how much ChatGPT knows about you.

Then, delete everything.

Go to Settings > Personalization > Manage Memories > Delete All Memories.

Most users have 6+ months of random, contradictory memories that actively hurt performance.

→ The 13-Question Framework.

Use this prompt with GPT-5: "Design my ChatGPT digital twin. Ask 13 items one at a time:

Identity, Role, Audience, Outputs, Personality, Voice, Formatting, Values, Projects, Goals, Preferences, Do-Not List, Privacy.

Target <1300 words total."

Read 15 tweets

Ruben Hassid

@RubenHssd

Sep 21

A new Yale paper reveals the brutal reality of the AGI economy:

Half the population could stop working tomorrow and GDP wouldn't budge.

Humans become economically meaningless.

The paper suggests we'll keep our jobs but lose something far more important:

We lose our economic purpose.

For centuries, human labor drove progress. We built cities, advanced science, created wealth. Work meant you mattered.

In the AGI economy, that connection breaks.

We keep jobs but lose our role as drivers of growth and progress.

The key insight comes from distinguishing "bottleneck" vs "accessory" work:

Read 9 tweets

Ruben Hassid

@RubenHssd

Aug 27

BREAKING: New Stanford study tracking 25 million US workers finds AI is systematically eliminating entry-level jobs.

Here are 6 disturbing facts from one of the largest AI employment study ever conducted:

(hint: young workers are getting obliterated)

Fact 1: Employment for early-career workers (ages 22-25) has declined substantially in occupations most exposed to AI.

Software developers aged 22-25 saw nearly 20% employment decline since late 2022, while older workers in the same occupations continued to grow.

Fact 2: Overall employment continues to grow robustly, but employment growth for young workers has been stagnant since late 2022.

In the highest AI-exposed occupations, young workers declined 6% while older workers in those same occupations grew 9%.

Read 12 tweets

Ruben Hassid

@RubenHssd

Aug 25

For the first time, Google has measured how much energy AI really uses in production.

Spoiler: the gap vs. all previous estimates is huge... 🧵

Despite AI transforming healthcare, education, and research, we've been flying blind on its environmental footprint.

Every estimate was based on lab benchmarks, not real-world production systems serving billions of users.

Google decided to measure what actually happens.

The results from measuring Gemini in production:

• 0.24 watt-hours per text prompt
• Equivalent to watching TV for 9 seconds
• 5 drops of water consumed
• 0.03 grams of CO2 emissions

Substantially lower than public estimates.

Read 14 tweets

Ruben Hassid

@RubenHssd

Aug 12

Meta just won the world's biggest brain competition by building an AI that can READ YOUR MIND while you watch movies.

1st place out of 263 teams.

This is the most insane paper I've ever read: 🧵

(hint: mind reading is here)

For context, the Algonauts competition challenged teams to build AI that predicts brain activity from videos.

263 teams competed.

Meta crushed it with the biggest 1st-2nd place gap ever.

Let me break down how:

TRIBE (TRImodal Brain Encoder) is the first AI trained to predict brain responses across multiple senses simultaneously.

Most brain studies focus on one thing; vision OR hearing OR language.

TRIBE does all three at once, just like your actual brain.

Read 17 tweets

Ruben Hassid

@RubenHssd

Aug 5

China built a computer with 2 billion neurons mimicking a monkey's brain.

If Moore's Law is still valid, we will have human-level brain computers with 86 billion neurons by 2033.

We are closer to duplicating humans.

Thread

China's progress is insane:

2020: Darwin Mouse (120 million neurons)
2025: Darwin Monkey (2 billion neurons)
2027: 4 billion neurons
2030: 16 billion neurons
2033: 86 billion neurons ← Human brain level

China went from mouse to monkey in 5 years.

What does a human brain computer actually mean?

Every thought, memory, and decision you make could theoretically be replicated in silicon.

We're talking about artificial consciousness that thinks like you do.

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Ruben Hassid

Try unrolling a thread yourself!

More from @RubenHssd

Ruben Hassid

Ruben Hassid

Ruben Hassid

Ruben Hassid

Ruben Hassid

Ruben Hassid

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!