Ask Perplexity Profile picture
Nov 28 11 tweets 6 min read Read on X
🐋 The Whale is back!!

DeepSeek just dropped an IMO gold-medalist model.

On ProofBench-Advanced—where models prove formal mathematical theorems—GPT-5 scores 20%. Gemini Deep Think IMO Gold hits 65.7%. DeepSeek Math V2 (Heavy) scores 61.9%.

That's second place—but Gemini isn't open source.

This is the best open math model in the world. And DeepSeek released the weights. Apache 2.0.

Here's what they discovered:Image
Image
1/ Why Normal LLMs Break on Real Math

Most large language models are great at sounding smart, but:
- They’re rewarded for the final answer, not the reasoning.
- If they accidentally land on the right number with bad logic, they still get full credit.
- Over time they become “confident liars”: fluent, persuasive, and sometimes wrong.

That’s fatal for real math, where the proof is the product.

To fix this, DeepSeek Math V2 changes what the model gets rewarded for: not just being right, but being rigorously right.Image
2/ The Core Idea: Generator + Verifier

Instead of one model doing everything, DeepSeek splits the job:
1. Generator – the “mathematician”
- Produces a full, step-by-step proof.

2. Verifier – the “internal auditor”
- Checks the proof for logical soundness.
- Ignores the final answer. It only cares about the reasoning.

This creates an internal feedback loop:
One model proposes, the other critiques.Image
3/ The Secret Sauce: 1.0/0.5/0.0

The verifier doesn't just say yes or no. It scores on three levels:

1.0 = Rigorous, watertight
0.5 = Right idea, sloppy execution
0.0 = Fatal flaws

That 0.5 is the breakthrough.

It's the referee saying: "You solved it, but this wouldn't pass peer review."
When the generator sees 0.5, it re-reads its own proof, finds the weak steps, tightens the argument.

The model learns to debug its reasoning, not just guess better.Image
4/ Putnam, IMO, and ProofBench

- Putnam 2024 – ~118/120
- IMO-Gold level performance
- On a “basic” proof dataset, V2 almost perfectly solves the set
- On an “advanced” dataset with long, tricky proofs, it still performs strongly, while many other large models collapse in accuracy

Models without this internal verifier do okay on short, easy proofs…
…and then fall off a cliff on long, complex ones.

DeepSeek’s architecture shows that built-in self-checking is the difference between “good at math questions” and “actually good at proofs.”Image
Image
5/ How They Trained It

Big risk is if the generator gets smart and the verifier stays weak, the generator learns to game it.

Three-phase solution:

Phase 1 – Human Cold Start. Contest problems graded by expert mathematicians. Anchors the verifier to real standards.

Phase 2 – Meta-Verification. The verifier can start hallucinating errors—seeing problems that don't exist. Solution: a second model checks whether critiques are legitimate or noise.

Phase 3 – Scaled Compute. For the hardest problems, human labeling is too slow. Run many verification passes, use majority vote as training signal.

Humans set the rules. Compute scales them.Image
Image
6/ Big Model, Big Hardware

DeepSeek Math V2 is a Mixture-of-Experts (MoE) model with about 685B parameters.
- Only some “experts” are active per problem, so each step is cheaper than a dense 685B model
- But all those parameters still have to live in GPU memory

The code is open. The bottleneck is compute.Image
7/ How You Actually Use It: Agent Mode

In practice, you don’t just send one prompt and get a perfect proof.
Instead, you run it in agent mode, something like:

1. Ask it to solve a problem.
2. It generates a proof and a self-verification score.
3. If the score is 0.5, you feed its own critique back in:
- “Refine this proof based on the issues you identified.”

4. Repeat this refinement loop a few times (e.g., up to 8 rounds).
5. Stop when it produces a 1.0 proof or you’re satisfied.

You're managing a feedback loop, not passively waiting for output.Image
8/ Limitations

Creativity. Great at formal reasoning and polishing proofs. Still struggles with problems needing genuinely novel insight.

Cost. Those record-setting scores rely on many proof attempts and verification runs. Real-world use means cheaper settings, slightly lower performance.

Residual Errors. The verifier is still a neural net. It can be fooled. Error rate is lower, not zero.

This is a big leap toward reliable reasoning—not "perfect AI mathematician."Image
9/ From Chatbots to Reasoners

DeepSeek Math V2 represents more than just a math milestone.

The pattern here will spread:
- Split generation and verification
- Train on proof quality, not just right answers
- Add self-critique loops and meta-verifiers

This is the template for any domain where being wrong is expensive—code, science, law, anything that needs to survive peer review.Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ask Perplexity

Ask Perplexity Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @AskPerplexity

Nov 27
Batteries Just Became AI Infrastructure

Battery storage is already scaling—159 GW deployed globally, 926 GW projected by 2033.

Renewables needed it first. Now AI needs it too.

Tesla is deploying Megapacks at data centers. China is deploying 30 GW this year, integrating storage directly into AI buildout.

Why? Data centers can’t scale without solving three problems:
- 7-year interconnection queues
- power quality GPUs demand
- backup without diesel permits

Batteries solve all three ↓
Why AI Data Centers Need Batteries

Interconnection is broken. Utility connection takes 7+ years. Batteries bypass it. Skip the queue.

GPUs break traditional power. Training loads swing 90% at 30 Hz. Batteries smooth it in 30 milliseconds.

Diesel doesn’t scale. Permitting is hard. For 20-hour backup, batteries are cost-competitive.

The math: ~1% of data center capex.Image
The Scale

Global capacity: 159 GW by end-2024. Up 85% from 86 GW in 2023. Projected: 926 GW by 2033.

Cost curve: $115/kWh in 2024, down 84% from $723/kWh in 2013. Still falling.

Economics flipped. Solar plus 4-hour storage runs ~$76/MWh. New gas peakers cost $80-120/MWh.

Storage wins in sunbelt markets now.Image
Image
Read 9 tweets
Nov 25
The universe isn’t just expanding — it’s speeding up

13.8 billion years after the Big Bang, astronomers expected gravity to slowly slow cosmic expansion. Instead, when they looked deep into space, they found the opposite: the universe is accelerating.

Whatever drives that acceleration makes up ~70% of the cosmos.

We call it dark energy.

We can measure it. We can see its effects. So what is it, really?
How we figured this out

Cepheid stars: the distance trick

Henrietta Leavitt discovered that certain stars (Cepheid variables) get brighter and dimmer with a regular period — and that period tells you their true brightness → lets us measure distance to faraway galaxies.

Redshift: galaxies on the move

Vesto Slipher used spectra of galaxies to show many had their light stretched to longer, redder wavelengths.
Redder → moving away faster.

Hubble & the expanding universe

Edwin Hubble and Milton Humason combined Cepheid distances with redshift and found a pattern:

>The farther a galaxy is, the faster it’s receding.

That’s the Hubble–Lemaître law: clear evidence that the universe is expanding.Image
Image
Image
Image
The shock: expansion is accelerating

In the 1990s, two teams studied Type Ia supernovae, stellar explosions so consistent in brightness that they act like “standard candles.”

By comparing how bright they should be to how bright they look, you can get distance.

By measuring redshift, you get how fast they’re moving away.

The surprise:

• The supernovae were dimmer and farther away than expected.

• That only made sense if, over billions of years, the universe’s expansion had sped up instead of slowing down.

This cosmic acceleration is what we now attribute to dark energy.Image
Read 6 tweets
Nov 24
🚨The White House just launched the Genesis Mission — a Manhattan Project for AI

The Department of Energy will build a national AI platform on top of U.S. supercomputers and federal science data, train scientific foundation models, and run AI agents + robotic labs to automate experiments in biotech, critical materials, nuclear fission/fusion, space, quantum, and semiconductors.

Let’s unpack what this order actually builds, and how it could rewire the AI, energy, and science landscape over the next decade:Image
1/ At the core is a new American Science and Security Platform.

DOE is ordered to turn the national lab system into an integrated stack that provides:
• HPC for large-scale model training, simulation, inference
• Domain foundation models across physics, materials, bio, energy
• AI agents to explore design spaces, evaluate experiments, automate workflows
• Robotic/automated labs + production tools for AI-directed experiments and manufacturing

National-scale AI scientist + AI lab tech as infrastructure.Image
2/ The targets are very explicit and very strategic.

Within 60 days, DOE has to propose at least 20 “national challenges” in:

• advanced manufacturing
• biotechnology
• critical materials
• nuclear fission & fusion
• quantum information science
• semiconductors & microelectronics

This is about energy dominance, supply chains, and defense.Image
Read 6 tweets
Nov 24
Nvidia is the central bank of AI compute.

It pulls in nearly $60B per quarter — almost all from a handful of hyperscalers who plan their AI roadmaps around Jensen's release cycle.

But three shifts are happening at once:
• Google is committing up to one million TPUs to Anthropic starting 2026 — the first credible alternative at frontier scale.
• Racks are already pushing hundreds of kilowatts, with megawatt systems on the horizon.
• Nvidia has $26B in commitments to rent back its own GPUs from cloud partners — up from $12.6B last quarter.

The real constraint isn't chips anymore — it's power and memory.

Over the next 3–5 years, this creates a fractured landscape: Nvidia GPUs as the default utility, Google TPUs as a real second ecosystem, and hyperscalers racing to escape the Nvidia tax.

Let’s walk through how that actually plays out:
1/ Nvidia now: dominant, concentrated, and structurally exposed

Nvidia's latest quarter (fiscal Q3 2026) is extreme:
• $57B in revenue, +62% YoY
• $51.2B from data center alone

But it’s dangerously concentrated:
• 4 customers = 61% of sales (up from 56% last quarter).

And Nvidia is renting back its own chips:
• $26B in off-balance-sheet commitments to pay hyperscalers for GPUs they can’t fully rent out, up from $12.6B the prior quarter.

That creates a circular-demand loop:
• sell chips to clouds → invest in AI customers → rent those same chips back when there’s slack.

Not a crisis. But a structural dependency that didn’t exist two years ago.Image
2/ TPUs: no longer just for Google

Google's 7th-gen TPU (Ironwood) is the first built for inference over training.

Why that matters: the bottleneck is shifting. Training a frontier model is a one-time cost. Serving it to billions of users is the recurring expense that actually scales.

The specs reflect this:
• Pods scale to 9,216 accelerators
• 1.77 PB of HBM3E memory per pod
• 9.6 Tb/s optical circuit-switching fabric

That memory pool and interconnect matter more than peak FLOPs. Large inference workloads are memory-bandwidth bound. Ironwood is designed around that reality.

Google's framing: "The hardest part is now serving AI to billions of users."Image
Read 8 tweets
Nov 20
The U.S. Power Crisis: How AI Data Centers Are Breaking the Grid

AI data centers are on track to become one of the biggest single loads on the U.S. grid. Data center electricity use is projected to jump from 176 TWh in 2023 to 450–580 TWh by 2028—up to 12% of all U.S. electricity.

That surge is slamming into a grid already strained by aging infrastructure, generator retirements, transformer shortages, and a collapse in transmission build-out.

By 2028, the U.S. faces a 13–73 GW shortfall of firm capacity—enough to power 3–18 million homes. This isn’t a distant 2040 climate scenario; it’s a 2025–2028 crunch already showing up in higher bills and growing reliability risks.

What does the next decade look like? Who pays for it? Here's the full breakdown:Image
The Demand Shock: A Collision with Reality

For two decades, U.S. electricity demand was flat. That era is over.

• The AI Factor: Traditional data centers consume 5-10 kW per rack. AI clusters require 60+ kW per rack—a 6-10x increase.

• Scale: A single cluster of 100,000 NVIDIA H100 GPUs consumes roughly 150 MW, enough to power a small city.

• The Timeline Mismatch: You can build a data center in 2-3 years. A power plant takes 5-15 years; transmission lines take 7-20 years.

Demand is simply outrunning the physical ability to build infrastructure.Image
Generation: Building Too Little, Too Late

We need to double the pace of generation additions, but every path forward is blocked.

• Natural Gas (The Only Near-Term Fix): Gas is the only baseload power deployable by 2028. However, turbines face 7-year wait times, and new plants face intense environmental opposition.

• Nuclear (The 2030s Solution): Despite hype from Amazon, Google, and Microsoft regarding Small Modular Reactors (SMRs), zero commercial SMRs will be online by 2028. The earliest optimistic timelines are 2030-2035.

• Renewables & Storage: Solar and batteries are growing fast, but they lack the "capacity factor" needed for 24/7 AI operations. Batteries are great for peak shaving, not multi-day backup.

• Coal: In a desperate move, utilities in Nebraska and Maryland are delaying coal plant retirements just to keep the lights on.Image
Read 9 tweets
Nov 18
"We won't reach AGI with LLMs."

Yann LeCun has been saying this for years. Now he's leaving Meta to prove it.

LeCun invented convolutional neural networks—the tech behind every smartphone camera and self-driving car today. He won the Turing Award in 2018, AI's Nobel Prize.

At 65, the leader of Meta's FAIR research lab is walking away from $600 billion in AI infrastructure, betting against the entire industry: Meta, OpenAI, Anthropic, xAI, Google.

Who is @ylecun? Why is he leaving, and why does his next move matter? Here's the story:
Who is Yann LeCun?

- Created convolutional neural networks (CNNs) in the 1980s — now foundational to computer vision
- Built LeNet at Bell Labs → first large-scale application of deep learning (bank check reading)
- Won Turing Award (2018) with Hinton & Bengio
- Joined Meta 2013, founded FAIR (Fundamental AI Research)
- Built a culture of open research: publishing freely, releasing open models

He's one of the "godfathers of deep learning."Image
LeCun's Core Technical Position

LeCun has been consistent since 2022: Large language models have fundamental limitations. They predict text patterns but lack:

- Understanding of physical world dynamics
- Persistent memory
- Causal reasoning
- Goal-directed planning

His famous analogy: "We can't even reproduce cat intelligence or rat intelligence, let alone dog intelligence."

He advocates for "world models" — AI systems that learn by observing the physical world, not just reading text. This represents not a rejection of LLMs as useless, but a belief they are insufficient as a path to general intelligence.Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(