Going outside CMTC's zones of comfort/expertise we bring up the issue of the generic scale invariance (ie power law scaling) in AI Large Language Models (LLM). This is an approximate empirical finding based primarily on two landmark papers from OpenAI 2020 and Google AI 2022...
These scaling laws are foundational in all aspects of the AI development, and basically asserts (simplistically) more data to train and/or more parameters in the model, better is the performance, claiming in some precise technical sense the improvement to scale as a power law!
Note that this is not obvious at all It could have been that the performance stalls at some point with increasing data or parameters, or it could have even become worse with overfitting and overtraining, but apparently that does not happen leading to the current AI explosion
This power law seems to arise only when the training data as well as the number of parameters are huge, it is almost like there is a 'dynamical phase transition' where suddenly the system becomes generically scale invariant manifesting power laws
Now, of theoretical physicists love one thing more than anything else, it is a power law because of its scale invariance-- it indicates a criticality where all scales contribute, which is unusual. It happens at phase transitions, one of the most studied subject in all of science
The acale invariance is also at the heart of one of the most used field theoretic techniques called 'Renormalization Group'. Basically physicists have a lovefest with power laws, and it is therefore no surprise that many theoretical physicists are fascinated by LLM power laws
In fact the first author of the 2020 OpenAI paper Scaling Laws for Neural Language Models used to be a card-carrying theoretical physicist before joining AI. We know of several theoretical physicists including two from CMTC who have become AI researchers inspired by LLM-scaling
Understanding LLM scaling is of great importance,not just because of the intellectual appeal of generic scale invariance, but because this is in some sense the key to the success of modern AI. Is there reallly an underlying dynamical phase transition leading to scale invariance?
Of course, the elephant in the room, the key question is:IS THERE ACTUAL SCALING IN LLM? There is no question that with enhanced training the performance improves but this does not necessarily imply scaling which has a power law going on forever. The scaling may be just effective
For example, it is possible that with further training the performance deteriorates at some point-- imagine finding that out after you have spent $100 billion producing a gigantic model with humongous amount of training data! How compelling is the evidence for scaling?
The problem is highly complex with many different aspects and many different scaling dependences and the experimentsto observe LLM scaling are hugely expensive.Establishing true scaling at the lambda point of He4 took many decades of work finally on the gravity-free space station
CMTC looked at the LLM scaling data with very limited experise, and concluded that the evidence for true scale invariance is weak, but effective scaling most certainly applies over limited ranges of the applicable scaling variables
But the scaling exponent is rather small ~ 0.05-0.1, meaning that a 100-fold increase in the training data would enhance performance by 25%-50%. Also, the exponent seems to be decreasing with increasing size and there are large error bars.
It is therefore quite important to build simple physics-type ("spherical cow") models to study LLM scaling strictly from a physics perspective. If there is generic scale invarianc in LLM, then the simplicity of the model is irrelevant since the details would not matter!
By contrast,it is crucial to understand the correction to scaling up to many orders and also the precise finite size corrections. All the AI money being invested in building more powerful LLM models are basically trying to use these 'higher order corrections' to make money!
Since CMTC expertise on the subject is limited as our leading AI experts have left CMTC joining the AI industry (where all the large data and the huge computing power to crunch the data are available, serious LLM work is not feasible in universities),we quote from an AI expert:
"The level of accuracy on establishing scaling exponents is not very high, and I would say it really is effective scaling -- the exponent can certainly change slowly over the regime of interest. A limitation on the empirical accuracy has been the costliness of experiments --
many of the larger LLM experiments are only done a few times. There are also a lot of other algorithmic knobs whose scaling protocol one must prescribe: unlike physics where typically it is only the system size that is scaled, here one has to prescribe how to change training time
learning rate, batching of data, etc with system scale as well. On the theoretical side, I would say that there are both models which exhibit multiple distinct scaling trends over some regime as well as models which have one dominant power-law scaling asymptotically."
Exciting!!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
What does the @satyanadella MSFT quantum computing claim imply? (1) There is a breakthrough singleshot measurement of the nanowire parity, which, coupled with tunneling measurements, strongly hint at the existence of Majorana zero modes (MZMs); (2) unpublished data presented....
at a MSFT Station Q meeting in Santa Barbara in front of 150 invitedvscientists on Feb 18 provide evidence for coupling and switching between MZMs in two different wires, which could lead to a topological qubit as it strongly suggests a protected 2-level system in the nanowire
If everything pans out the way MSFT sees it, there is a good possibility that this could be the beginning of engineeering multi-qubit toplogical systems, but a lot more work is obviously necessary, building a quantum computer is not supposed to be easy or fast--
Let us contrast Al and Cu-- both well-known extensively used metals with high electrical (and thermal) conductivity. Al becomes a SC below T_c ~ 1.2K and Cu does not show SC even at the lowest attainable temperatures . Why this big difference?
We start by discussing their resistivity (in ) at room temperatures: 2 (Cu); 3 (Al). Resistivity increases for linearly with T for T>50K, but Al resistivity increases somewhat faster than that of Cu (Fe is also shown, with much higher resistivity) https://t.co/l19sSiUjVZmicro-ohm.cm
As far as conducting behavior goes, Cu>Al>>Fe, but Cu and Fe are not SC, but Al is! So, a better conductor does not lead to a better superconductor. In fact, YBCO with T_c ~ 90K, has a room temperature resistivity 500 times that of Cu and Al, and is a 'bad' metal(but, great SC)!
LK99 was the 9th top item in the Jul30-Aug5 period on Wikipedia. All other items in the top 50 are (as expected) entertainment/sports/celebrity related. This is so remarkable that CMTC is truly speechless, the phenomenon cannot be explained away as.... en.wikipedia.org/wiki/Wikipedia…
just 'irrational exuberance', obviously, millions of people who had no idea what a SC is became very excited about the possibility of an ambient room-temperature SC, and since SC is a central research topic for CMTC, we are happy that in the future, it would be easy for us...
to explain what we do (no normal person outside physics has any idea what 'condensed matter theory' is) as we can just proudly say that we work on superconductivity. We have no idea why the subject became so popular, perhaps 'levitation' videos and 'live streaming' helped...
A tutorial on why high-T_c superconductivity is so difficult to achieve.
SC requires a coherent quantum condensation of electrons breaking a subtle symmetry so that the electrons in the solid spontaneously develop an energy gap in its spectrum. This can only happen if there..
is a strong interaction overcoming the kinetic energy of motion. Since all solids are made only of electrons and the background lattice of ions, the two sources of interactions are electron-electro and electron-lattice interaction-- called electron-phonon interaction, ...
each characterized by an interaction strength, called a coupling constant. Most (if not all) SCs (Al, Pb, Nb, Hg...) are caused by el-ph interaction, but in principle, el-el interaction may lead to SC. To get high T_c, all one needs is to tune the appropriate SC coupling to a..
More #LK99 good/bad news. Good news is more results have been reported. Bad news is exactly the same. NTU, an excellent university which has sent its students as PhD students and postdocs to CMTC over the years, reports insulating resistivity increasing with decreasing T and...
some diamagnetism (they are continuing their LK99 experiments) and NPL, a top govt lab in India, reports no SC and some diamagnetism in its latest arXiv posting. It is now increasingly difficult to give the benefit of the doubt to the OP by Lee et al
An important preprint tonight is from ICQM, a top research center in China (which has several CMTC alumni in its faculty), finds no SC, but small amount of ferromagnetism (not diamagnetism) in tiny flakes of LK99 samples. No SC at all in all 3 reports arxiv.org/abs/2308.03110
Since CMTC produced many papers on flatbands and on superconductivity (rarely together), we are delighted seeing flatbands of LK99 being discussed everywhere, this can only be good. In case, you are still confused, flatbands in the simplest terms imply 'very heavy electron mass'
Why is heavy mass important? Because large mass means slow motion, and slow motion means low kinetic energy, and this means that the effects of interactions among electrons become very important, making the system 'strongly correlated', something we theorists love because the...
problem becomes extremely complicated as we must solve 10^23 electrons strongly interacting together, and theoretical physicists love difficult problems-- band structure calculations producing flatbands immediately become suspect since band theory neglects interactions