Latest Twitter Threads by @nick11roberts on Thread Reader App

Apr 6 • 11 tweets • 4 min read

That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it?
As it turns out, we have a brand new scaling law for that! 🧵

[1/n]

Introducing Train-to-Test (T²) scaling! We found that test-time scaling via repeated sampling means that radical overtraining like this - training a smaller model for way longer - is actually compute optimal! 🙀

[2/n]arxiv.org/abs/2604.01411

Mar 21, 2025 • 12 tweets • 4 min read

📉📉NEW SCALING LAW PHENOMENON 📉📉

We find that knowledge and reasoning exhibit different scaling behaviors!

Super excited to finally tell you all about our paper on the compute optimal scaling of skills:

[1/n] arxiv.org/pdf/2503.10061

(First some context) Scaling laws can tell you how to use your compute budget.

In compute optimal scaling, you are given a compute budget, and you need to decide how to use it, often by balancing model size with the amount of training data

[2/n]

Share this page!

Enter URL or ID to Unroll