Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

MatthewBerman

@MatthewBerman

Feb 13 • 13 tweets • 5 min read • Read on X

Scrolly

New research paper shows how LLMs can "think" internally before outputting a single token!

Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space.

TONS of benefits from this approach.

Let me break down this fascinating paper...

The key insight:

Human thinking often happens before we verbalize thoughts.

Traditional LLMs think by generating tokens (Chain of Thought), but this new approach lets models reason in their continuous latent space first.

So what is it?

The researchers built a 3.5B parameter model with a recurrent architecture that can "think" repeatedly in latent space before generating any output.

The more thinking iterations, the better the performance!

How it works:

The model has 3 parts:
• Prelude: Embeds input into latent space
• Recurrent Core: Does the actual thinking
• Coda: Converts thoughts back to tokens

The cool part?

The model can dynamically adjust how much "thinking" it needs based on the task:

• Math problems → More iterations
• Simple tasks → Fewer iterations

Just like humans do!

Key advantages over traditional Chain of Thought:

• No special training data needed
• Works with smaller context windows
• Can capture reasoning that's hard to put into words

Weird finding:

The researchers found fascinating patterns in how the model thinks - it develops "orbits" and "sliders" in latent space to represent different types of reasoning!

Is it good?

Performance scales impressively with compute - at 32 recurrent iterations, it matches models 2x its size on many tasks.

And it can go even further!

This architecture also enables some neat tricks:

• Zero-shot adaptive compute
• KV cache sharing
• Continuous chain-of-thought

All without special training!

The implications are huge:

This could be a missing piece in getting LLMs to truly reason rather than just manipulate language.

https://x.com/ylecun/status/1728867808875082017

It's especially interesting given Yann LeCun's critiques that LLMs can't truly reason.

This latent approach might bridge that gap.

https://x.com/ylecun/status/1728867808875082017

While still a proof of concept, the results suggest this could be a powerful new direction for language models - combining internal reasoning with traditional token generation.

Full paper: arxiv.org/abs/2502.05171

Breakdown video: youtube.com/watch?v=ZLtXXF…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @MatthewBerman

MatthewBerman

@MatthewBerman

Jan 21

DeepSeek R1 has been out for 24 hours.

The AI industry's reaction has been...strong!

Here's a collection of the most telling reactions: 🧵

https://x.com/DrJimFan/status/1881353126210687089

Dr. Jim Fan, Sr. Research Manager at NVIDIA, points out how odd it is that a non-US company is leading the Open Source AI charge, given that was the original mission of OpenAI.

https://x.com/DrJimFan/status/1881353126210687089

https://x.com/AravSrinivas/status/1881372861405036773

Aravind Srinivas, CEO of Perplexity, says DeepSeek has replicated o1-mini and open-sourced it.

I'd say it's more comparable to o1-preview...but...semantics :)

https://x.com/AravSrinivas/status/1881372861405036773

Read 10 tweets

MatthewBerman

@MatthewBerman

Jan 18

Test Time Compute is bigger than anyone realizes.

It's the most important breakthrough in AI since Transformers.

Let me explain...🧵

What is Test Time Compute?

Think of it like this: Instead of AI giving instant answers, it now "thinks" longer - just like humans do when solving complex problems.

The Proof

Google DeepMind proved that scaling test-time compute can be more effective than increasing model parameters.

Then the o1/o3 model crushed benchmarks by thinking longer.

Read 11 tweets

MatthewBerman

@MatthewBerman

Jan 16

1/ SakanaAI just dropped their latest research: Transformer²

It's a self-adaptive architecture that allows AI to evolve at inference time.

Model weights are no longer "static"

Let’s break it down: 🧵

2/ Traditional Transformers are static post-training.

Once trained, they can’t learn or adapt without expensive fine-tuning or additional methods like retrieval-augmented generation (RAG).

Transformer² changes this entirely.

3/ The core innovation?

A two-pass system. 🌀

• Pass 1: Analyze the task (e.g., math, coding, or reasoning) to understand the query.

• Pass 2: Dynamically update specific model weights based on the task.

This makes the model far more adaptable.

Read 11 tweets

MatthewBerman

@MatthewBerman

Jan 15

1/ Google Research unveils new paper: "Titans: Learning to Memorize at Test Time"

It introduces human-like memory structures to overcome the limits of Transformers, with one "SURPRISING" feature.

Here's why this is huge for AI. 🧵👇

2/ The Problem:

Transformers, the backbone of most AI today, struggle with long-term memory due to quadratic memory complexity.

Basically, there's a big penalty for long context windows!

Titans aims to solve this with massive scalability.

3/ What Makes Titans Different?

Inspired by human memory, Titans integrate:

• Short-term memory (real-time processing)
• Long-term memory (retaining key past information)
• Persistent memory (task-specific baked-in knowledge)

This modular approach mimics how the brain works.

Read 11 tweets

MatthewBerman

@MatthewBerman

Jan 13

1/9 BREAKING

Biden Admin drops major AI chip rules today!

The 200+ page "AI Diffusion" framework completely reshapes global AI tech trade.

Key goal: Keep advanced AI development running on "American rails"

But not everyone is happy... 🧵

2/9 THE ALLIES LIST

18 countries get VIP treatment with ZERO restrictions - including UK, Canada, Japan, Germany, South Korea & Taiwan.

These trusted partners can freely access US AI tech.

Small orders (up to 1,700 GPUs) worldwide won't need special permission.

3/9 TRUSTED STATUS

Companies in allied nations can become "Universal Verified End Users" - letting them use up to 7% of their AI compute globally.

BUT they must keep 75% of total compute power in US/allied territory.

Microsoft & Google already preparing to comply.

Read 11 tweets

MatthewBerman

@MatthewBerman

Jan 8

What will society look like after AGI is achieved?

I found a great prediction on LessWrong by L Rudolf L (link below).

Capital will matter MORE after AGI.

A thread on the future of wealth, power & human agency 🧵

1/ Most think money won't matter post-AGI.

But here's why that's wrong: AI will make capital (factories, data centers, money) MORE powerful while making human labor LESS valuable.

2/ Today, money struggles to buy top talent.

Think SpaceX vs Blue Origin - Bezos had billions but still couldn't outperform SpaceX's intense culture & talent.

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

MatthewBerman

Try unrolling a thread yourself!

More from @MatthewBerman

MatthewBerman

MatthewBerman

MatthewBerman

MatthewBerman

MatthewBerman

MatthewBerman

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!