Nicholas Roberts Profile picture
Ph.D. student @WisconsinCS. Working on foundation models and breaking past scaling laws. Previously CMU @mldcmu, UCSD @ucsd_cse, FCC @fresnocity. πŸ€”πŸ€¨πŸ§ e/hmm
Apr 6 β€’ 11 tweets β€’ 4 min read
That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it?
As it turns out, we have a brand new scaling law for that! 🧡

[1/n] Image Introducing Train-to-Test (TΒ²) scaling! We found that test-time scaling via repeated sampling means that radical overtraining like this - training a smaller model for way longer - is actually compute optimal! πŸ™€



[2/n]arxiv.org/abs/2604.01411
Mar 21, 2025 β€’ 12 tweets β€’ 4 min read
πŸ“‰πŸ“‰NEW SCALING LAW PHENOMENON πŸ“‰πŸ“‰

We find that knowledge and reasoning exhibit different scaling behaviors!

Super excited to finally tell you all about our paper on the compute optimal scaling of skills:


[1/n] arxiv.org/pdf/2503.10061Image (First some context) Scaling laws can tell you how to use your compute budget.

In compute optimal scaling, you are given a compute budget, and you need to decide how to use it, often by balancing model size with the amount of training data

[2/n]