New research paper shows how LLMs can "think" internally before outputting a single token!
Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space.
TONS of benefits from this approach.
Let me break down this fascinating paper...
The key insight:
Human thinking often happens before we verbalize thoughts.
Traditional LLMs think by generating tokens (Chain of Thought), but this new approach lets models reason in their continuous latent space first.
So what is it?
The researchers built a 3.5B parameter model with a recurrent architecture that can "think" repeatedly in latent space before generating any output.
The more thinking iterations, the better the performance!
How it works:
The model has 3 parts:
• Prelude: Embeds input into latent space
• Recurrent Core: Does the actual thinking
• Coda: Converts thoughts back to tokens
The cool part?
The model can dynamically adjust how much "thinking" it needs based on the task:
• Math problems → More iterations
• Simple tasks → Fewer iterations
Just like humans do!
Key advantages over traditional Chain of Thought:
• No special training data needed
• Works with smaller context windows
• Can capture reasoning that's hard to put into words
Weird finding:
The researchers found fascinating patterns in how the model thinks - it develops "orbits" and "sliders" in latent space to represent different types of reasoning!
Is it good?
Performance scales impressively with compute - at 32 recurrent iterations, it matches models 2x its size on many tasks.
While still a proof of concept, the results suggest this could be a powerful new direction for language models - combining internal reasoning with traditional token generation.
.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.
This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.
Here’s what they learned: 🧵
Introducing PaperBench.
A new framework designed to test this very capability!
It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results.
How does it work?
Agents got the raw paper PDF, tools like web access & coding environments, and need to write code to replicate key findings – a task taking human experts days.
The agents had 12 hours and no prior knowledge of the paper.