Post

Ankesh Anand

@ankesh_anand

Dec 19, 2025 • 1 tweets • 1 min read • Read on X

https://twitter.com/scaling01/status/2001803023811797433

"how can flash beat pro??" -> the answer is RL!

flash is not just a distilled pro. we've had lots of exciting research progress on agentic RL which made its way into flash but was too late for pro.

can't wait to finally bring them to pro👀

https://twitter.com/scaling01/status/2001803023811797433

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ankesh_anand

Ankesh Anand

@ankesh_anand

Jan 29, 2025

The DeepSeek discourse is simultaneously under-crediting and over-crediting them for what they achieved. So, some quick thoughts:

1. Re Distillation claims: DeepSeekCoder-V2 [1] was released in June 2024, and they had RL on verifiable rewards working back then with great success. They were the only team outside of Gemini and OpenAI I knew of that were RL-pilled.

6 months from then on, I place a very low prior this team would just distill from o1 given how much info about o1 was in the OpenAI blog post itself. They may have used o1 CoTs as examples to seed their human CoT data, but again the RL team there is talented enough to do it on their own.
[1] arxiv.org/abs/2406.11931

2. The 5.5M$ thing is entirely believable for "one final run", and a massive feat. It is however only surprising if you compare with llama3 training costs which was a couple generations behind as Wenfeng mentions here. Obviously, the total R&D costs are much higher.

Read 5 tweets

Share this page!

Enter URL or ID to Unroll

Ankesh Anand

Try unrolling a thread yourself!

More from @ankesh_anand

Ankesh Anand

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!