Is Chain-of-Thought (CoT) reasoning in LLMs just...for show?
@AnthropicAI’s new research paper shows that not only do AI models not use CoT like we thought, they might not use it at all for reasoning.
In fact, they might be lying to us in their CoT.
What you need to know: 🧵
“Thinking models” use CoT to explore and reason about solutions before outputting their answer.
This CoT has shown to increase a model’s reasoning ability and gives us insight into how the model is thinking.
Anthropic's research asks: Is CoT faithful?
Apr 3 • 8 tweets • 3 min read
.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.
This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.
Here’s what they learned: 🧵
Introducing PaperBench.
A new framework designed to test this very capability!
It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results.
Apr 1 • 10 tweets • 4 min read
We knew very little about how LLMs actually work...until now.
@AnthropicAI just dropped the most insane research paper, detailing some of the ways AI "thinks."
And it's completely different than we thought.
Here are their wild findings: 🧵
Finding 1: Universal Language of Thought?
Claude doesn't seem to have separate "brains" for different languages: French, Chinese, English etc.
Instead, it uses a shared "language" representation of the world.
Concepts like "small" or "antonym" activate regardless of the input language!
Mar 30 • 13 tweets • 6 min read
I've spent 200+ hours Vibe Coding games and apps.
It's insane what you can build with just your voice and AI...ANYONE can do it.
Here's everything I learned about Vibe Coding: 🧵
Which tool for vibe coding? 🤔
🔹 AI Editors: @Windsurf_AI & Cursor are top picks! (Built on familiar VS Code, AI-native features).
🔹 VS Code Extensions: Like Cline, if you want AI inside your current VS Code setup.
🔹 Online IDEs: @Replit is fully browser-based & great for quick deployment.
🔹 Claude/ChatGPT/Gemini can run simple HTML/JS in-browser via "canvas" - good for basic tasks & learning!
Mar 26 • 11 tweets • 3 min read
Gemini 2.5 Pro is insane at coding.
It's far better than anything else I've tested.
Below are ONE SHOT demos 🧵
A Rubik's cube generator AND solver. I've tried this with Claude 3.7 thinking, DeepSeek etc and never came close to this.
Mar 7 • 10 tweets • 4 min read
AI has changed my life.
I'm now 100x more productive than I ever was.
How do I use it? Which tools do I use?
Here are my actual use cases for AI: 👇
1/ Search
In fact, I probably use it 50x per day.
For search, I'm mostly going to @perplexity_ai. But I also use @grok and @ChatGPTapp every so often.
Here are some actual searches I've done recently:
Mar 7 • 10 tweets • 4 min read
Major AI breakthrough: Diffusion Large Language Models are here!
They're 10x faster and 10x cheaper than traditional LLMs.
Here's everything you need to know:
Traditional LLMs generate tokens sequentially—each token must wait for the previous one.
Diffusion LLMs generate the entire output simultaneously and then iteratively refine it, similar to text-to-image diffusion models.
Feb 19 • 6 tweets • 3 min read
Introducing Google's AI Co-Scientist 🧪
Google's new AI collaborator accelerating breakthroughs in biomedicine and beyond.
A thread on how this multi-agent Gemini 2.0 system is reshaping scientific discovery. 👇
1/ From CRISPR to AI: Meet the virtual collaborator mirroring the scientific method
Google’s AI Co-Scientist combines Gemini 2.0’s reasoning with a "scientific method" blueprint—using specialized agents (Generation, Reflection, Ranking) to iteratively propose, refine, and validate hypotheses.
Think of it as a Nobel-caliber brainstorming partner.
Feb 16 • 11 tweets • 4 min read
OpenAI just dropped a paper that reveals the blueprint for creating the best AI coder in the world.
But here’s the kicker: this strategy isn’t just for coding—it’s the clearest path to AGI and beyond.
Let’s break it down 🧵👇 1/ OpenAI’s latest research shows that reinforcement learning + test-time compute is the key to building superintelligent AI.
Sam Altman himself said OpenAI’s model went from ranking 175th to 50th in competitive coding—and expects #1 by year-end.
Feb 13 • 13 tweets • 5 min read
New research paper shows how LLMs can "think" internally before outputting a single token!
Unlike Chain of Thought, this "latent reasoning" happens in the model's hidden space.
TONS of benefits from this approach.
Let me break down this fascinating paper...
The key insight:
Human thinking often happens before we verbalize thoughts.
Traditional LLMs think by generating tokens (Chain of Thought), but this new approach lets models reason in their continuous latent space first.
Jan 21 • 10 tweets • 3 min read
DeepSeek R1 has been out for 24 hours.
The AI industry's reaction has been...strong!
Here's a collection of the most telling reactions: 🧵
Dr. Jim Fan, Sr. Research Manager at NVIDIA, points out how odd it is that a non-US company is leading the Open Source AI charge, given that was the original mission of OpenAI.