Latest Twitter Threads by @josancamon19 on Thread Reader App

Oct 10 • 8 tweets • 3 min read

🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL?

We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild:

Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns.

The scaling law? Not what you'd expect

The plot shows something crazy:

• 0.6B: +61.6% accuracy from 0.003% more compute
• 1.7B: +25.6% gain (still massive)
• 14B: +6.3% gain (still good, but...)

RL compute is a ROUNDING ERROR compared to pretraining, yet unlocks step-function improvements in small models.

Share this page!

Enter URL or ID to Unroll