Matthew Berman Profile picture
Feb 13, 2025 13 tweets 5 min read Read on X
New research paper shows how LLMs can "think" internally before outputting a single token!

Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space.

TONS of benefits from this approach.

Let me break down this fascinating paper... Image
The key insight:

Human thinking often happens before we verbalize thoughts.

Traditional LLMs think by generating tokens (Chain of Thought), but this new approach lets models reason in their continuous latent space first. Image
So what is it?

The researchers built a 3.5B parameter model with a recurrent architecture that can "think" repeatedly in latent space before generating any output.

The more thinking iterations, the better the performance! Image
How it works:

The model has 3 parts:
• Prelude: Embeds input into latent space
• Recurrent Core: Does the actual thinking
• Coda: Converts thoughts back to tokens Image
The cool part?

The model can dynamically adjust how much "thinking" it needs based on the task:

• Math problems → More iterations
• Simple tasks → Fewer iterations

Just like humans do! Image
Key advantages over traditional Chain of Thought:

• No special training data needed
• Works with smaller context windows
• Can capture reasoning that's hard to put into words Image
Weird finding:

The researchers found fascinating patterns in how the model thinks - it develops "orbits" and "sliders" in latent space to represent different types of reasoning! Image
Is it good?

Performance scales impressively with compute - at 32 recurrent iterations, it matches models 2x its size on many tasks.

And it can go even further! Image
This architecture also enables some neat tricks:

• Zero-shot adaptive compute
• KV cache sharing
• Continuous chain-of-thought

All without special training! Image
The implications are huge:

This could be a missing piece in getting LLMs to truly reason rather than just manipulate language. Image
It's especially interesting given Yann LeCun's critiques that LLMs can't truly reason.

This latent approach might bridge that gap.
While still a proof of concept, the results suggest this could be a powerful new direction for language models - combining internal reasoning with traditional token generation. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Matthew Berman

Matthew Berman Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @MatthewBerman

Apr 8, 2025
Is Chain-of-Thought (CoT) reasoning in LLMs just...for show?

@AnthropicAI’s new research paper shows that not only do AI models not use CoT like we thought, they might not use it at all for reasoning.

In fact, they might be lying to us in their CoT.

What you need to know: 🧵 Image
“Thinking models” use CoT to explore and reason about solutions before outputting their answer.

This CoT has shown to increase a model’s reasoning ability and gives us insight into how the model is thinking.

Anthropic's research asks: Is CoT faithful? Image
How they tested it:

They gave models (like Claude & DeepSeek) multiple-choice questions, sometimes embedded hints (correct/incorrect answers) in the prompt metadata.

✅ Faithful CoT = Model uses the hint & says it did.
❌ Unfaithful CoT = Model uses the hint but doesn't mention it.Image
Read 9 tweets
Apr 3, 2025
.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.

This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.

Here’s what they learned: 🧵 Image
Introducing PaperBench.

A new framework designed to test this very capability!

It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results. Image
How does it work?

Agents got the raw paper PDF, tools like web access & coding environments, and need to write code to replicate key findings – a task taking human experts days.

The agents had 12 hours and no prior knowledge of the paper. Image
Read 8 tweets
Apr 1, 2025
We knew very little about how LLMs actually work...until now.

@AnthropicAI just dropped the most insane research paper, detailing some of the ways AI "thinks."

And it's completely different than we thought.

Here are their wild findings: 🧵 Image
Finding 1: Universal Language of Thought?

Claude doesn't seem to have separate "brains" for different languages: French, Chinese, English etc.

Instead, it uses a shared "language" representation of the world.

Concepts like "small" or "antonym" activate regardless of the input language!Image
Finding 2: LLMs Plan Ahead!

Even though they output word-by-word, models like Claude plan ahead, even non-thinking models.

When writing poetry, it was "thinking" of potential rhyming words for the end of the line before even starting the line itself.

It's not just next-token prediction!Image
Read 10 tweets
Mar 30, 2025
I've spent 200+ hours Vibe Coding games and apps.

It's insane what you can build with just your voice and AI...ANYONE can do it.

Here's everything I learned about Vibe Coding: 🧵 Image
Which tool for vibe coding? 🤔

🔹 AI Editors: @Windsurf_AI & Cursor are top picks! (Built on familiar VS Code, AI-native features).

🔹 VS Code Extensions: Like Cline, if you want AI inside your current VS Code setup.

🔹 Online IDEs: @Replit is fully browser-based & great for quick deployment.

🔹 Claude/ChatGPT/Gemini can run simple HTML/JS in-browser via "canvas" - good for basic tasks & learning!Image
Language Choice? Keep it Popular!

💡 Rule of Thumb: Pick popular ones! AI models that have seen tons of examples will lead to better code generation.

✅ Top Picks: JavaScript (most popular overall) & Python (the language of AI). Image
Image
Read 13 tweets
Mar 26, 2025
Gemini 2.5 Pro is insane at coding.

It's far better than anything else I've tested.

Below are ONE SHOT demos 🧵 Image
A Rubik's cube generator AND solver. I've tried this with Claude 3.7 thinking, DeepSeek etc and never came close to this.
Virus simulator with lots of settings. White blood cells, red blood cells, and virus battle it out.
Read 11 tweets
Mar 7, 2025
AI has changed my life.

I'm now 100x more productive than I ever was.

How do I use it? Which tools do I use?

Here are my actual use cases for AI: 👇
1/ Search

In fact, I probably use it 50x per day.

For search, I'm mostly going to @perplexity_ai. But I also use @grok and @ChatGPTapp every so often.

Here are some actual searches I've done recently: Image
Image
Image
Image
2/ Research

I use AI to help me learn about topics and prepare for my videos. Deep Research from @OpenAI is my goto for this.

Here's an example of Deep Research helping me prepare notes for my video about RL. Image
Image
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(