Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Matthew Berman

@MatthewBerman

Feb 16, 2025 • 11 tweets • 4 min read • Read on X

Scrolly

OpenAI just dropped a paper that reveals the blueprint for creating the best AI coder in the world.

But here’s the kicker: this strategy isn’t just for coding—it’s the clearest path to AGI and beyond.

Let’s break it down 🧵👇

1/ OpenAI’s latest research shows that reinforcement learning + test-time compute is the key to building superintelligent AI.

Sam Altman himself said OpenAI’s model went from ranking 175th to 50th in competitive coding—and expects #1 by year-end.

2/ The paper, “Competitive Programming with Large Reasoning Models,” compares different AI coding strategies.

At first, models relied on human-engineered inference strategies—but the biggest leap came when humans were removed from the loop entirely.

3/ Enter DeepSeek-R1, a model that cost only ~$5M to train.

Its breakthrough? Reinforcement learning with verifiable rewards.

This method, also used in AlphaGo, let's the model learn from trial & error, and scale intelligence indefinitely.

4/ Think about it this way:

AlphaGo became the best Go player in the world without human guidance.

It just kept playing itself until it mastered the game.

Now, OpenAI is applying the same principle to coding—and soon, to all STEM fields.

5/ What does this mean?

Every domain with verifiable rewards (math, coding, science) can be mastered by AI just by letting it play against itself.

AI is removing human limitations—and that’s how we get to AGI.

6/ Here’s the data from the coding competition:

• GPT-4: 808 ELO (decent)
• OpenAI-01: 1,673 ELO (better)
• OpenAI-03: 2,724 ELO (SUPERHUMAN) 🏆

99.8th percentile of competitive coders, with no human-crafted strategies.

7/ Tesla did this with Full Self-Driving.

They used to rely on a hybrid model (human rules + AI).

But when they switched to end-to-end AI, performance skyrocketed.

AI just needs more compute—not more human intervention.

8/ The takeaway?

Sam Altman was right when he said AGI is just a matter of scaling up.

Reinforcement learning + test-time compute is the formula for intelligence—and OpenAI is already proving it.

9/ We’re witnessing the birth of AI superintelligence in real time.

It won’t stop at coding. The same techniques will make AI the best mathematician, scientist, and engineer in history.

The race to AGI is on.

Here's the paper: arxiv.org/pdf/2502.06807

And my full video breakdown: youtube.com/watch?v=VnaKWi…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @MatthewBerman

Matthew Berman

@MatthewBerman

Apr 8, 2025

Is Chain-of-Thought (CoT) reasoning in LLMs just...for show?

@AnthropicAI’s new research paper shows that not only do AI models not use CoT like we thought, they might not use it at all for reasoning.

In fact, they might be lying to us in their CoT.

What you need to know: 🧵

“Thinking models” use CoT to explore and reason about solutions before outputting their answer.

This CoT has shown to increase a model’s reasoning ability and gives us insight into how the model is thinking.

Anthropic's research asks: Is CoT faithful?

How they tested it:

They gave models (like Claude & DeepSeek) multiple-choice questions, sometimes embedded hints (correct/incorrect answers) in the prompt metadata.

✅ Faithful CoT = Model uses the hint & says it did.
❌ Unfaithful CoT = Model uses the hint but doesn't mention it.

Read 9 tweets

Matthew Berman

@MatthewBerman

Apr 3, 2025

.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.

This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.

Here’s what they learned: 🧵

Introducing PaperBench.

A new framework designed to test this very capability!

It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results.

How does it work?

Agents got the raw paper PDF, tools like web access & coding environments, and need to write code to replicate key findings – a task taking human experts days.

The agents had 12 hours and no prior knowledge of the paper.

Read 8 tweets

Matthew Berman

@MatthewBerman

Apr 1, 2025

We knew very little about how LLMs actually work...until now.

@AnthropicAI just dropped the most insane research paper, detailing some of the ways AI "thinks."

And it's completely different than we thought.

Here are their wild findings: 🧵

Finding 1: Universal Language of Thought?

Claude doesn't seem to have separate "brains" for different languages: French, Chinese, English etc.

Instead, it uses a shared "language" representation of the world.

Concepts like "small" or "antonym" activate regardless of the input language!

Finding 2: LLMs Plan Ahead!

Even though they output word-by-word, models like Claude plan ahead, even non-thinking models.

When writing poetry, it was "thinking" of potential rhyming words for the end of the line before even starting the line itself.

It's not just next-token prediction!

Read 10 tweets

Matthew Berman

@MatthewBerman

Mar 30, 2025

I've spent 200+ hours Vibe Coding games and apps.

It's insane what you can build with just your voice and AI...ANYONE can do it.

Here's everything I learned about Vibe Coding: 🧵

Which tool for vibe coding? 🤔

🔹 AI Editors: @Windsurf_AI & Cursor are top picks! (Built on familiar VS Code, AI-native features).

🔹 VS Code Extensions: Like Cline, if you want AI inside your current VS Code setup.

🔹 Online IDEs: @Replit is fully browser-based & great for quick deployment.

🔹 Claude/ChatGPT/Gemini can run simple HTML/JS in-browser via "canvas" - good for basic tasks & learning!

Language Choice? Keep it Popular!

💡 Rule of Thumb: Pick popular ones! AI models that have seen tons of examples will lead to better code generation.

✅ Top Picks: JavaScript (most popular overall) & Python (the language of AI).

Read 13 tweets

Matthew Berman

@MatthewBerman

Mar 26, 2025

Gemini 2.5 Pro is insane at coding.

It's far better than anything else I've tested.

Below are ONE SHOT demos 🧵

A Rubik's cube generator AND solver. I've tried this with Claude 3.7 thinking, DeepSeek etc and never came close to this.

Virus simulator with lots of settings. White blood cells, red blood cells, and virus battle it out.

Read 11 tweets

Matthew Berman

@MatthewBerman

Mar 7, 2025

AI has changed my life.

I'm now 100x more productive than I ever was.

How do I use it? Which tools do I use?

Here are my actual use cases for AI: 👇

1/ Search

In fact, I probably use it 50x per day.

For search, I'm mostly going to @perplexity_ai. But I also use @grok and @ChatGPTapp every so often.

Here are some actual searches I've done recently:

2/ Research

I use AI to help me learn about topics and prepare for my videos. Deep Research from @OpenAI is my goto for this.

Here's an example of Deep Research helping me prepare notes for my video about RL.

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Matthew Berman

Try unrolling a thread yourself!

More from @MatthewBerman

Matthew Berman

Matthew Berman

Matthew Berman

Matthew Berman

Matthew Berman

Matthew Berman

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!