Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Condensed Matter Theory Center

@condensed_the

Aug 12 • 19 tweets • 4 min read • Read on X

Going outside CMTC's zones of comfort/expertise we bring up the issue of the generic scale invariance (ie power law scaling) in AI Large Language Models (LLM). This is an approximate empirical finding based primarily on two landmark papers from OpenAI 2020 and Google AI 2022...

These scaling laws are foundational in all aspects of the AI development, and basically asserts (simplistically) more data to train and/or more parameters in the model, better is the performance, claiming in some precise technical sense the improvement to scale as a power law!

Note that this is not obvious at all It could have been that the performance stalls at some point with increasing data or parameters, or it could have even become worse with overfitting and overtraining, but apparently that does not happen leading to the current AI explosion

This power law seems to arise only when the training data as well as the number of parameters are huge, it is almost like there is a 'dynamical phase transition' where suddenly the system becomes generically scale invariant manifesting power laws

Now, of theoretical physicists love one thing more than anything else, it is a power law because of its scale invariance-- it indicates a criticality where all scales contribute, which is unusual. It happens at phase transitions, one of the most studied subject in all of science

The acale invariance is also at the heart of one of the most used field theoretic techniques called 'Renormalization Group'. Basically physicists have a lovefest with power laws, and it is therefore no surprise that many theoretical physicists are fascinated by LLM power laws

In fact the first author of the 2020 OpenAI paper Scaling Laws for Neural Language Models used to be a card-carrying theoretical physicist before joining AI. We know of several theoretical physicists including two from CMTC who have become AI researchers inspired by LLM-scaling

Understanding LLM scaling is of great importance,not just because of the intellectual appeal of generic scale invariance, but because this is in some sense the key to the success of modern AI. Is there reallly an underlying dynamical phase transition leading to scale invariance?

Of course, the elephant in the room, the key question is:IS THERE ACTUAL SCALING IN LLM? There is no question that with enhanced training the performance improves but this does not necessarily imply scaling which has a power law going on forever. The scaling may be just effective

For example, it is possible that with further training the performance deteriorates at some point-- imagine finding that out after you have spent $100 billion producing a gigantic model with humongous amount of training data! How compelling is the evidence for scaling?

The problem is highly complex with many different aspects and many different scaling dependences and the experimentsto observe LLM scaling are hugely expensive.Establishing true scaling at the lambda point of He4 took many decades of work finally on the gravity-free space station

CMTC looked at the LLM scaling data with very limited experise, and concluded that the evidence for true scale invariance is weak, but effective scaling most certainly applies over limited ranges of the applicable scaling variables

But the scaling exponent is rather small ~ 0.05-0.1, meaning that a 100-fold increase in the training data would enhance performance by 25%-50%. Also, the exponent seems to be decreasing with increasing size and there are large error bars.

It is therefore quite important to build simple physics-type ("spherical cow") models to study LLM scaling strictly from a physics perspective. If there is generic scale invarianc in LLM, then the simplicity of the model is irrelevant since the details would not matter!

By contrast,it is crucial to understand the correction to scaling up to many orders and also the precise finite size corrections. All the AI money being invested in building more powerful LLM models are basically trying to use these 'higher order corrections' to make money!

Since CMTC expertise on the subject is limited as our leading AI experts have left CMTC joining the AI industry (where all the large data and the huge computing power to crunch the data are available, serious LLM work is not feasible in universities),we quote from an AI expert:

"The level of accuracy on establishing scaling exponents is not very high, and I would say it really is effective scaling -- the exponent can certainly change slowly over the regime of interest. A limitation on the empirical accuracy has been the costliness of experiments --

many of the larger LLM experiments are only done a few times. There are also a lot of other algorithmic knobs whose scaling protocol one must prescribe: unlike physics where typically it is only the system size that is scaled, here one has to prescribe how to change training time

learning rate, batching of data, etc with system scale as well. On the theoretical side, I would say that there are both models which exhibit multiple distinct scaling trends over some regime as well as models which have one dominant power-law scaling asymptotically."

Exciting!!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Condensed Matter Theory Center

Try unrolling a thread yourself!

More from @condensed_the

Condensed Matter Theory Center

Condensed Matter Theory Center

Condensed Matter Theory Center

Condensed Matter Theory Center

Condensed Matter Theory Center

Condensed Matter Theory Center

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!