Thread Reader
Share this page!
×
Post
Share
Email
Enter URL or ID to Unroll
×
Unroll Thread
You can paste full URL like: https://x.com/threadreaderapp/status/1644127596119195649
or just the ID like: 1644127596119195649
How to get URL link on X (Twitter) App
On the Twitter thread, click on
or
icon on the bottom
Click again on
or
Share Via icon
Click on
Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at
Twitter Help
Joan Cabezas
@josancamon19
wip ai researcher | @ycombinator alum | built @omidotme, sometimes do vc.
Subscribe
Save as PDF
Oct 10
•
8 tweets
•
3 min read
🧵 As AI labs race to scale RL, one question matters: when should you stop pre-training and start RL?
We trained 5 Qwen models (0.6B→14B) with RL on GSM8K and found something wild:
Small models see EMERGENCE-LIKE jumps. Large models see diminishing returns.
The scaling law? Not what you'd expect
The plot shows something crazy:
• 0.6B: +61.6% accuracy from 0.003% more compute
• 1.7B: +25.6% gain (still massive)
• 14B: +6.3% gain (still good, but...)
RL compute is a ROUNDING ERROR compared to pretraining, yet unlocks step-function improvements in small models.