Joan Cabezas Profile picture
wip ai researcher | @ycombinator alum | built @omidotme, sometimes do vc.
Oct 10 8 tweets 3 min read
🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL?

We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild:

Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns.

The scaling law? Not what you'd expectImage The plot shows something crazy:

• 0.6B: +61.6% accuracy from 0.003% more compute
• 1.7B: +25.6% gain (still massive)
• 14B: +6.3% gain (still good, but...)

RL compute is a ROUNDING ERROR compared to pretraining, yet unlocks step-function improvements in small models.