Tweet

Andy Jones

8 Apr, 11 tweets, 5 min read

🚨 I've a paper out today: Scaling Scaling Laws with Board Games! 🚨

arxiv.org/abs/2104.03113

Principle result is that by studying a sequence of small problems in ML, I could predict the outcome of experiments on orders-of-magnitude larger problems 🤯

I worked on Hex. Hex is a board game, with all the strategic depth of Go but also a much simpler rule set. Crucially, Hex on small boards is easy, and Hex on big boards is hard!

I wrote a fast, all-GPU version of AlphaZero, and used it to train ~200 different neural nets across a bunch of board sizes. Plotted together, the best-performing nets at each level of compute form a steady trend, the *compute frontier*

All of these frontiers can be explained by a simple family of curves, which say

* If you've 2x the compute of your opponent, you've a 2/3 chance of winning

* Adding +1 to the board size makes perfect play 7x harder

And this holds across ~6 orders of magnitude!

This similarity across scales means that curves fit on small, cheap board sizes are excellent predictors of the compute frontiers at bigger board sizes.

In fact, the error appears to decay exponentially as you add more small board sizes:

So what's the takeaway from all this? There's no generality here yet, but it's a proof of concept, a proof that you can study the small and make claims about the large.

If you're a resource-bound researcher and you want to study big models, I think the scaling laws paradigm is something to dive into.

Bonus plot: while doing all the above, I was prodded into looking at how train-time and test-time compute trade off. Quite strikingly, at every level of performance you can knock off 15x test compute, add 10x train compute, and keep the same performance!

And a final acknowledgement: I'm officially an independent researcher, but the level of support I got from the ML and games community was incredible.

@paulfchristiano

Only a subset of the folk I'd like to thank hang out on Twitter, but @paulfchristiano @janleike @sharky6000 @JacobHHilton @ClemensWinter all went out of their way to help me out with this, and it's dramatically better for their help.

The other group I need to thank are the folks on the RL Discord. I posted a draft Monday evening, and got a dozen researchers offering me advice on how to improve it. I only got ~half of that advice in in time, but that half made a huge difference.

discord.gg/xhfNqQv

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @andy_l_jones

Andy Jones

@andy_l_jones

20 Dec 19

https://twitter.com/jachiam0/status/1207975400929976320

I can't recall any _techniques_ that knocked me off my chair, but there have been a couple of papers on training phenomena which have had a serious impact on how I think about RL:

https://twitter.com/jachiam0/status/1207975400929976320

'Meta learner's dynamics are unlike learners': you ask a regular NN to learn a transformation and it'll learn the component with the largest eigenvalue first, then the second largest, etc etc. A meta-learner will learn all the components simultaneously! arxiv.org/abs/1905.01320

'Ray Interference': whenever an agent can choose between something it's good at and something it's bad at, it'll focus on the thing it's good at and so make no progress on the thing it's bad. Obvious when it's said like that, but was a revelation to me! arxiv.org/abs/1904.11455

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Andy Jones

Try unrolling a thread yourself!

More from @andy_l_jones

Andy Jones

Did Thread Reader help you today?

Like this author's thread?