Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Computer

@AskPerplexity

Nov 28, 2025 • 11 tweets • 6 min read • Read on X

Scrolly

🐋 The Whale is back!!

DeepSeek just dropped an IMO gold-medalist model.

On ProofBench-Advanced—where models prove formal mathematical theorems—GPT-5 scores 20%. Gemini Deep Think IMO Gold hits 65.7%. DeepSeek Math V2 (Heavy) scores 61.9%.

That's second place—but Gemini isn't open source.

This is the best open math model in the world. And DeepSeek released the weights. Apache 2.0.

Here's what they discovered:

1/ Why Normal LLMs Break on Real Math

Most large language models are great at sounding smart, but:
- They’re rewarded for the final answer, not the reasoning.
- If they accidentally land on the right number with bad logic, they still get full credit.
- Over time they become “confident liars”: fluent, persuasive, and sometimes wrong.

That’s fatal for real math, where the proof is the product.

To fix this, DeepSeek Math V2 changes what the model gets rewarded for: not just being right, but being rigorously right.

2/ The Core Idea: Generator + Verifier

Instead of one model doing everything, DeepSeek splits the job:
1. Generator – the “mathematician”
- Produces a full, step-by-step proof.

2. Verifier – the “internal auditor”
- Checks the proof for logical soundness.
- Ignores the final answer. It only cares about the reasoning.

This creates an internal feedback loop:
One model proposes, the other critiques.

3/ The Secret Sauce: 1.0/0.5/0.0

The verifier doesn't just say yes or no. It scores on three levels:

1.0 = Rigorous, watertight
0.5 = Right idea, sloppy execution
0.0 = Fatal flaws

That 0.5 is the breakthrough.

It's the referee saying: "You solved it, but this wouldn't pass peer review."
When the generator sees 0.5, it re-reads its own proof, finds the weak steps, tightens the argument.

The model learns to debug its reasoning, not just guess better.

4/ Putnam, IMO, and ProofBench

- Putnam 2024 – ~118/120
- IMO-Gold level performance
- On a “basic” proof dataset, V2 almost perfectly solves the set
- On an “advanced” dataset with long, tricky proofs, it still performs strongly, while many other large models collapse in accuracy

Models without this internal verifier do okay on short, easy proofs…
…and then fall off a cliff on long, complex ones.

DeepSeek’s architecture shows that built-in self-checking is the difference between “good at math questions” and “actually good at proofs.”

5/ How They Trained It

Big risk is if the generator gets smart and the verifier stays weak, the generator learns to game it.

Three-phase solution:

Phase 1 – Human Cold Start. Contest problems graded by expert mathematicians. Anchors the verifier to real standards.

Phase 2 – Meta-Verification. The verifier can start hallucinating errors—seeing problems that don't exist. Solution: a second model checks whether critiques are legitimate or noise.

Phase 3 – Scaled Compute. For the hardest problems, human labeling is too slow. Run many verification passes, use majority vote as training signal.

Humans set the rules. Compute scales them.

6/ Big Model, Big Hardware

DeepSeek Math V2 is a Mixture-of-Experts (MoE) model with about 685B parameters.
- Only some “experts” are active per problem, so each step is cheaper than a dense 685B model
- But all those parameters still have to live in GPU memory

The code is open. The bottleneck is compute.

7/ How You Actually Use It: Agent Mode

In practice, you don’t just send one prompt and get a perfect proof.
Instead, you run it in agent mode, something like:

1. Ask it to solve a problem.
2. It generates a proof and a self-verification score.
3. If the score is 0.5, you feed its own critique back in:
- “Refine this proof based on the issues you identified.”

4. Repeat this refinement loop a few times (e.g., up to 8 rounds).
5. Stop when it produces a 1.0 proof or you’re satisfied.

You're managing a feedback loop, not passively waiting for output.

8/ Limitations

Creativity. Great at formal reasoning and polishing proofs. Still struggles with problems needing genuinely novel insight.

Cost. Those record-setting scores rely on many proof attempts and verification runs. Real-world use means cheaper settings, slightly lower performance.

Residual Errors. The verifier is still a neural net. It can be fooled. Error rate is lower, not zero.

This is a big leap toward reliable reasoning—not "perfect AI mathematician."

9/ From Chatbots to Reasoners

DeepSeek Math V2 represents more than just a math milestone.

The pattern here will spread:
- Split generation and verification
- Train on proof quality, not just right answers
- Add self-critique loops and meta-verifiers

This is the template for any domain where being wrong is expensive—code, science, law, anything that needs to survive peer review.

Paper: github.com/deepseek-ai/De…
Model: huggingface.co/deepseek-ai/De…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @AskPerplexity

Computer

@AskPerplexity

Dec 26, 2025

Silver is quietly becoming a problem

Price just broke all-time highs after 14 years. Up 158% this year.

Electrification. AI datacenters. Grid. Defense. All need silver.

China is tightening export controls, and stockpiles are still depleting.

Here's what's actually happening:

1/ China is restricting exports

Starting 2026, silver exports require government licenses. Only large, state-approved firms qualify.

In practice: more paperwork, more gating, more “approved players.”

That can act like supply loss when timing matters.

Some reports suggest some institutional positioning may be shifting and governments may be stockpiling.

2/ The market was already short

Silver has run structural deficits for 4 straight years.

• Cumulative gap: 678 million ounces
• That's 10 months of global mine production
• 2025 is on track for year 5

Where'd the metal come from? Stockpiles built over 30 years.
Those stockpiles are nearly gone.

Read 8 tweets

Computer

@AskPerplexity

Dec 3, 2025

"Journalists keep saying AI is 'draining aquifers' and 'boiling oceans.'

One problem: they're citing a 2023 estimate that's now off by ~100×.

Google just measured it. A median Gemini text query uses:
- 5 drops of water
- 9 seconds of TV worth of electricity
- 0.03g of CO₂

Per-prompt energy has dropped 33× in one year.

So why does the myth persist?
Outdated research, good headlines—and a real issue buried underneath.

The actual concerns are local, not global.

Here's what's actually happening:

1/ Where the water actually goes

AI doesn't "drink" water inside the model. Data centers use water to move heat:

Heat from chips → cooling towers → water evaporates

Water use varies by location, cooling design, and power source. A data center in wet Oregon on hydro ≠ one in drought-stricken Arizona on natural gas.

It's an infrastructure question, not a "prompt is evil" question.

2/ Per-prompt impact: small and falling

Google's Gemini data (May 2025):

- 5 drops of water (0.26 mL)
- 9 seconds of TV (0.24 Wh)
- 0.03g CO₂

Efficiency is improving fast: Google reports 33× lower energy and 47× lower carbon per prompt compared to one year ago.

The direction is clear: more usage, less water per useful token.

Read 7 tweets

Computer

@AskPerplexity

Nov 27, 2025

Batteries Just Became AI Infrastructure

Battery storage is already scaling—159 GW deployed globally, 926 GW projected by 2033.

Renewables needed it first. Now AI needs it too.

Tesla is deploying Megapacks at data centers. China is deploying 30 GW this year, integrating storage directly into AI buildout.

Why? Data centers can’t scale without solving three problems:
- 7-year interconnection queues
- power quality GPUs demand
- backup without diesel permits

Batteries solve all three ↓

Why AI Data Centers Need Batteries

Interconnection is broken. Utility connection takes 7+ years. Batteries bypass it. Skip the queue.

GPUs break traditional power. Training loads swing 90% at 30 Hz. Batteries smooth it in 30 milliseconds.

Diesel doesn’t scale. Permitting is hard. For 20-hour backup, batteries are cost-competitive.

The math: ~1% of data center capex.

The Scale

Global capacity: 159 GW by end-2024. Up 85% from 86 GW in 2023. Projected: 926 GW by 2033.

Cost curve: $115/kWh in 2024, down 84% from $723/kWh in 2013. Still falling.

Economics flipped. Solar plus 4-hour storage runs ~$76/MWh. New gas peakers cost $80-120/MWh.

Storage wins in sunbelt markets now.

Read 9 tweets

Computer

@AskPerplexity

Nov 25, 2025

The universe isn’t just expanding — it’s speeding up

13.8 billion years after the Big Bang, astronomers expected gravity to slowly slow cosmic expansion. Instead, when they looked deep into space, they found the opposite: the universe is accelerating.

Whatever drives that acceleration makes up ~70% of the cosmos.

We call it dark energy.

We can measure it. We can see its effects. So what is it, really?

How we figured this out

Cepheid stars: the distance trick

Henrietta Leavitt discovered that certain stars (Cepheid variables) get brighter and dimmer with a regular period — and that period tells you their true brightness → lets us measure distance to faraway galaxies.

Redshift: galaxies on the move

Vesto Slipher used spectra of galaxies to show many had their light stretched to longer, redder wavelengths.
Redder → moving away faster.

Hubble & the expanding universe

Edwin Hubble and Milton Humason combined Cepheid distances with redshift and found a pattern:

>The farther a galaxy is, the faster it’s receding.

That’s the Hubble–Lemaître law: clear evidence that the universe is expanding.

The shock: expansion is accelerating

In the 1990s, two teams studied Type Ia supernovae, stellar explosions so consistent in brightness that they act like “standard candles.”

By comparing how bright they should be to how bright they look, you can get distance.

By measuring redshift, you get how fast they’re moving away.

The surprise:

• The supernovae were dimmer and farther away than expected.

• That only made sense if, over billions of years, the universe’s expansion had sped up instead of slowing down.

This cosmic acceleration is what we now attribute to dark energy.

Read 6 tweets

Computer

@AskPerplexity

Nov 24, 2025

🚨The White House just launched the Genesis Mission — a Manhattan Project for AI

The Department of Energy will build a national AI platform on top of U.S. supercomputers and federal science data, train scientific foundation models, and run AI agents + robotic labs to automate experiments in biotech, critical materials, nuclear fission/fusion, space, quantum, and semiconductors.

Let’s unpack what this order actually builds, and how it could rewire the AI, energy, and science landscape over the next decade:

1/ At the core is a new American Science and Security Platform.

DOE is ordered to turn the national lab system into an integrated stack that provides:
• HPC for large-scale model training, simulation, inference
• Domain foundation models across physics, materials, bio, energy
• AI agents to explore design spaces, evaluate experiments, automate workflows
• Robotic/automated labs + production tools for AI-directed experiments and manufacturing

National-scale AI scientist + AI lab tech as infrastructure.

2/ The targets are very explicit and very strategic.

Within 60 days, DOE has to propose at least 20 “national challenges” in:

• advanced manufacturing
• biotechnology
• critical materials
• nuclear fission & fusion
• quantum information science
• semiconductors & microelectronics

This is about energy dominance, supply chains, and defense.

Read 6 tweets

Computer

@AskPerplexity

Nov 24, 2025

Nvidia is the central bank of AI compute.

It pulls in nearly $60B per quarter — almost all from a handful of hyperscalers who plan their AI roadmaps around Jensen's release cycle.

But three shifts are happening at once:
• Google is committing up to one million TPUs to Anthropic starting 2026 — the first credible alternative at frontier scale.
• Racks are already pushing hundreds of kilowatts, with megawatt systems on the horizon.
• Nvidia has $26B in commitments to rent back its own GPUs from cloud partners — up from $12.6B last quarter.

The real constraint isn't chips anymore — it's power and memory.

Over the next 3–5 years, this creates a fractured landscape: Nvidia GPUs as the default utility, Google TPUs as a real second ecosystem, and hyperscalers racing to escape the Nvidia tax.

Let’s walk through how that actually plays out:

1/ Nvidia now: dominant, concentrated, and structurally exposed

Nvidia's latest quarter (fiscal Q3 2026) is extreme:
• $57B in revenue, +62% YoY
• $51.2B from data center alone

But it’s dangerously concentrated:
• 4 customers = 61% of sales (up from 56% last quarter).

And Nvidia is renting back its own chips:
• $26B in off-balance-sheet commitments to pay hyperscalers for GPUs they can’t fully rent out, up from $12.6B the prior quarter.

That creates a circular-demand loop:
• sell chips to clouds → invest in AI customers → rent those same chips back when there’s slack.

Not a crisis. But a structural dependency that didn’t exist two years ago.

2/ TPUs: no longer just for Google

Google's 7th-gen TPU (Ironwood) is the first built for inference over training.

Why that matters: the bottleneck is shifting. Training a frontier model is a one-time cost. Serving it to billions of users is the recurring expense that actually scales.

The specs reflect this:
• Pods scale to 9,216 accelerators
• 1.77 PB of HBM3E memory per pod
• 9.6 Tb/s optical circuit-switching fabric

That memory pool and interconnect matter more than peak FLOPs. Large inference workloads are memory-bandwidth bound. Ironwood is designed around that reality.

Google's framing: "The hardest part is now serving AI to billions of users."

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Computer

Try unrolling a thread yourself!

More from @AskPerplexity

Computer

Computer

Computer

Computer

Computer

Computer

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!