Latest Twitter Threads by @MatthewBerman on Thread Reader App

Apr 8 • 9 tweets • 4 min read

Is Chain-of-Thought (CoT) reasoning in LLMs just...for show?

@AnthropicAI’s new research paper shows that not only do AI models not use CoT like we thought, they might not use it at all for reasoning.

In fact, they might be lying to us in their CoT.

What you need to know: 🧵

“Thinking models” use CoT to explore and reason about solutions before outputting their answer.

This CoT has shown to increase a model’s reasoning ability and gives us insight into how the model is thinking.

Anthropic's research asks: Is CoT faithful?

Apr 3 • 8 tweets • 3 min read

.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.

This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.

Here’s what they learned: 🧵

Introducing PaperBench.

A new framework designed to test this very capability!

It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results.

Apr 1 • 10 tweets • 4 min read

We knew very little about how LLMs actually work...until now.

@AnthropicAI just dropped the most insane research paper, detailing some of the ways AI "thinks."

And it's completely different than we thought.

Here are their wild findings: 🧵

Finding 1: Universal Language of Thought?

Claude doesn't seem to have separate "brains" for different languages: French, Chinese, English etc.

Instead, it uses a shared "language" representation of the world.

Concepts like "small" or "antonym" activate regardless of the input language!

Mar 30 • 13 tweets • 6 min read

I've spent 200+ hours Vibe Coding games and apps.

It's insane what you can build with just your voice and AI...ANYONE can do it.

Here's everything I learned about Vibe Coding: 🧵

Which tool for vibe coding? 🤔

🔹 AI Editors: @Windsurf_AI & Cursor are top picks! (Built on familiar VS Code, AI-native features).

🔹 VS Code Extensions: Like Cline, if you want AI inside your current VS Code setup.

🔹 Online IDEs: @Replit is fully browser-based & great for quick deployment.

🔹 Claude/ChatGPT/Gemini can run simple HTML/JS in-browser via "canvas" - good for basic tasks & learning!

Mar 26 • 11 tweets • 3 min read

Gemini 2.5 Pro is insane at coding.

It's far better than anything else I've tested.

Below are ONE SHOT demos 🧵

A Rubik's cube generator AND solver. I've tried this with Claude 3.7 thinking, DeepSeek etc and never came close to this.

Mar 7 • 10 tweets • 4 min read

AI has changed my life.

I'm now 100x more productive than I ever was.

How do I use it? Which tools do I use?

Here are my actual use cases for AI: 👇 1/ Search

In fact, I probably use it 50x per day.

For search, I'm mostly going to @perplexity_ai. But I also use @grok and @ChatGPTapp every so often.

Here are some actual searches I've done recently:

Mar 7 • 10 tweets • 4 min read

Major AI breakthrough: Diffusion Large Language Models are here!

They're 10x faster and 10x cheaper than traditional LLMs.

Here's everything you need to know:

Traditional LLMs generate tokens sequentially—each token must wait for the previous one.

Diffusion LLMs generate the entire output simultaneously and then iteratively refine it, similar to text-to-image diffusion models.

Feb 19 • 6 tweets • 3 min read

Introducing Google's AI Co-Scientist 🧪

Google's new AI collaborator accelerating breakthroughs in biomedicine and beyond.

A thread on how this multi-agent Gemini 2.0 system is reshaping scientific discovery. 👇

1/ From CRISPR to AI: Meet the virtual collaborator mirroring the scientific method

Google’s AI Co-Scientist combines Gemini 2.0’s reasoning with a "scientific method" blueprint—using specialized agents (Generation, Reflection, Ranking) to iteratively propose, refine, and validate hypotheses.

Think of it as a Nobel-caliber brainstorming partner.

Feb 16 • 11 tweets • 4 min read

OpenAI just dropped a paper that reveals the blueprint for creating the best AI coder in the world.

But here’s the kicker: this strategy isn’t just for coding—it’s the clearest path to AGI and beyond.

Let’s break it down 🧵👇

1/ OpenAI’s latest research shows that reinforcement learning + test-time compute is the key to building superintelligent AI.

Sam Altman himself said OpenAI’s model went from ranking 175th to 50th in competitive coding—and expects #1 by year-end.

Feb 13 • 13 tweets • 5 min read

New research paper shows how LLMs can "think" internally before outputting a single token!

Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space.

TONS of benefits from this approach.

Let me break down this fascinating paper...

The key insight:

Human thinking often happens before we verbalize thoughts.

Traditional LLMs think by generating tokens (Chain of Thought), but this new approach lets models reason in their continuous latent space first.

Jan 21 • 10 tweets • 3 min read

DeepSeek R1 has been out for 24 hours.

The AI industry's reaction has been...strong!

Here's a collection of the most telling reactions: 🧵

Dr. Jim Fan, Sr. Research Manager at NVIDIA, points out how odd it is that a non-US company is leading the Open Source AI charge, given that was the original mission of OpenAI.

https://x.com/DrJimFan/status/1881353126210687089

Jan 18 • 11 tweets • 4 min read

Test Time Compute is bigger than anyone realizes.

It's the most important breakthrough in AI since Transformers.

Let me explain...🧵

What is Test Time Compute?

Think of it like this: Instead of AI giving instant answers, it now "thinks" longer - just like humans do when solving complex problems.

Jan 16 • 11 tweets • 5 min read

1/ SakanaAI just dropped their latest research: Transformer²

It's a self-adaptive architecture that allows AI to evolve at inference time.

Model weights are no longer "static"

Let’s break it down: 🧵

2/ Traditional Transformers are static post-training.

Once trained, they can’t learn or adapt without expensive fine-tuning or additional methods like retrieval-augmented generation (RAG).

Transformer² changes this entirely.

Jan 15 • 11 tweets • 5 min read

1/ Google Research unveils new paper: "Titans: Learning to Memorize at Test Time"

It introduces human-like memory structures to overcome the limits of Transformers, with one "SURPRISING" feature.

Here's why this is huge for AI. 🧵👇

2/ The Problem:

Transformers, the backbone of most AI today, struggle with long-term memory due to quadratic memory complexity.

Basically, there's a big penalty for long context windows!

Titans aims to solve this with massive scalability.

Jan 13 • 11 tweets • 4 min read

1/9 BREAKING

Biden Admin drops major AI chip rules today!

The 200+ page "AI Diffusion" framework completely reshapes global AI tech trade.

Key goal: Keep advanced AI development running on "American rails"

But not everyone is happy... 🧵

2/9 THE ALLIES LIST

18 countries get VIP treatment with ZERO restrictions - including UK, Canada, Japan, Germany, South Korea & Taiwan.

These trusted partners can freely access US AI tech.

Small orders (up to 1,700 GPUs) worldwide won't need special permission.

Jan 8 • 12 tweets • 2 min read

What will society look like after AGI is achieved?

I found a great prediction on LessWrong by L Rudolf L (link below).

Capital will matter MORE after AGI.

A thread on the future of wealth, power & human agency 🧵 1/ Most think money won't matter post-AGI.

But here's why that's wrong: AI will make capital (factories, data centers, money) MORE powerful while making human labor LESS valuable.

Jan 3 • 11 tweets • 4 min read

Want to know how OpenAI's o1 and o3 models work?

Chinese researchers figured it out!

They dropped a research paper explaining EVERYTHING.

Here's what you need to know: 🧵

1/ The paper analyzes four key components needed to achieve o1/o3-level performance:

Policy initialization: Train model with human-like reasoning behaviors

Reward design: Design feedback signals to guide model improvement

Search capabilities: Explore multiple solutions through tree/sequential strategies

Learning methods: Update model using search-generated data and rewards

Dec 28, 2024 • 12 tweets • 4 min read

Anthropic just dropped an incredible guide on "How To Build Effective Agents"

2025 will be the year of AGENTS 🤖

Here's everything you need to know: 🧵

Simple > Complex

When building LLM agents, the most successful implementations use basic composable patterns.

My take: agentic frameworks are great for not needing to reinvent the wheel while building agent patterns.

Dec 27, 2024 • 12 tweets • 4 min read

1/ 🚨 Big news: OpenAI makes it clear they are evolving into a for-profit

They are moving to a more closed and for-profit model while doubling down on AGI safety and scalability.

Is this the right balance of ethics and ambition, or is it a departure from their ideals?

Let’s unpack. 🧵

2/ 🚀 Mission: AGI for All

OpenAI’s mission remains the same: to ensure AGI benefits all of humanity.

But with AGI development accelerating, they claim changes are critical to stay competitive and address complex challenges.

Dec 24, 2024 • 14 tweets • 4 min read

o3 was announced less than a week ago and the AI industry was stunned.

I've collected some of the reactions from the biggest names in AI: 🧵

Balaji on how incredible the 25% score on Frontier Math really is.

https://x.com/balajis/status/1870206318872801766?s=46

Dec 20, 2024 • 8 tweets • 3 min read

.@OpenAI just dropped o3 and o3-mini!

This is AGI (not clickbait)

o3 is the best AI ever created, and its performance is WILD.

Here's everything you need to know: 🧵

O3 is their most advanced model yet.

Excelling at coding, mathematics, and even PhD-level science tasks.

O3 Mini offers incredible cost-performance optimization, perfect for diverse use cases.

Share this page!

Enter URL or ID to Unroll