Latest Twitter Threads by @industriaalist on Thread Reader App

Jun 4 • 8 tweets • 4 min read

1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs?

Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget.

w/ @bishmdl76 @akshayvegesna @ShmuelBerman

2/ Paper:

q0 is built on one intuition, motivated by Solomonoff induction: instead of training one perfect model, train a population of diverse models and aggregate predictions. Everything in the algorithm follows from this one goal of efficiently training a population. It comes down to three core primitives:arxiv.org/abs/2606.03938

Mar 9 • 5 tweets • 4 min read

1/ We released NanoGPT Slowrun 10 days ago. Already at 8x data efficiency and improving fast, so we're doubling down.

Announcing Slowrun Research and Slowrun Cluster: our open research effort to collaborate with researchers with crazy ideas, and a serious cluster to back it.

2/ Why? Compute scales. Data doesn't. Current scaling laws require both to grow proportionally, and that's a big problem. We need fundamentally new learning algorithms in the limited data, practically infinite compute settings.

Slowrun is already surfacing new data-efficient methods, but we want to aim for at least 100x data efficiency this year and that will take a lot more exploration.

Share this page!

Enter URL or ID to Unroll