Tweet

Anders Sandberg

7 Oct, 21 tweets, 8 min read

#FridayPhysicsFun – This year’s Nobel Prize in Physics went to Manabe, Hasselmann (climate) and Parisi. Media will focus on the climate stuff, because it is easy to explain. But what did Parisi do? And why does it matter for machine learning? nobelprize.org/prizes/physics…

A lot of this is way beyond me: I am not good enough at statistical mechanics to explain or use this directly. But I can see the shadows cast on the landscape by these results, and they are awesome. nobelprize.org/uploads/2021/1…

The start is spin glasses: solids where some atoms have spins that affect other spins, but in a disordered way (rather than the neat all parallel spins in ferromagnetic materials). They try to minimize their energy, but there will be frustration.

It would be nice to be able to calculate average properties that one can measure macroscopically (observables) of such materials, such as overall magnetization or heat capacity.

In standard statistical mechanics this is pretty easy: assume all states are possible each with probability declining exponentially with their energy, calculate the so-called partition function Z, and you can now calculate the observables. Ta-da! en.wikipedia.org/wiki/Boltzmann…

(OK, “easy” may involve some pretty tough integrals, sometimes dropping awesome Riemann zeta function values into the final expression. This is why the average number of bits of information per photon in blackbody radiation has the value below.) arxiv.org/abs/1511.01162

This runs into trouble with spin glasses since there are many possible energy minima and different samples will have entirely different configurations of minima because of local disorder. They all look the same if you squint, but calculating the partition function becomes hard.

This matters not just in spin glasses but in machine learning too: we want to calculate observables for the trained system (which minimizes error rather than physical energy) for random training data. It turns out that the mathematical problem is similar. boazbarak.org/Papers/replica…

The replica trick was the first step towards handling this. Basically you estimate the partition function by imagining N copies of the system, take the average over them to get a formula, and then let N approach zero.
en.wikipedia.org/wiki/Replica_t…

https://twitter.com/johncarlosbaez/status/1446111065885057024

Yes, this is pretty nonsensical! We are making an integer number of things approach zero continuously. There are several more things here that make mathematicians cringe (0×0 matrices!) It is a trick, and why it works is still not 100% clear.

https://twitter.com/johncarlosbaez/status/1446111065885057024

Still, it works… sometimes. One way it can fail is “replica symmetry breaking”, where the system is not ergodic (taking averages over time is not the same as averages across replicas).

Ergodicity is one of those important assumptions that make statistical mechanics work that nevertheless fail for many complex systems (notable examples: economics and evolution). nature.com/articles/s4156…

One way of thinking about the problem is that here it matters a lot if you take the average performance of a single run of a simulation or the average performance of many runs of the simulation. When replica symmetry breaks the replicas behave in different ways.

This is where Parisi did important work. He found that one could get around it by assuming that there are a lot (infinite) “order parameters” describing the system, describing an intricate hierarchy of energy minima.
journals.aps.org/prl/abstract/1…
journals.aps.org/prl/abstract/1…

Indeed, most complex systems have a near infinite variety of possible “ground states”. Convergence to them will also take vastly different times. arxiv.org/abs/cond-mat/0…

This is where I encountered his work when I did my PhD on neural networks. The good old Hopfield network, a simple model of associative memory in the brain, works just like a spin glass system. The number of memories that can be stored is determined by Parisi’s theory.

“Heating up” the network by adding noise or making the neurons respond less sharply merges nearby memory states hierarchically. This seems to be very much like how we abstract categories. And Parisi’s theory tells us about the structure. en.wikipedia.org/wiki/Ultrametr…

The methods developed in this field also apply to modern neural networks, allowing some mathematical grip on how they converge and behave. In particular, they allow calculating how the error changes as more training data is provided. journals.aps.org/prx/abstract/1…

They also show when thinking about constraint satisfaction problems: as you have more variables and more constraints on them, how many solutions are there? Are there nearby solutions? When does the problem typically become insoluble?

There are links to questions about how computationally hard it is to find minima of random polynomials
windowsontheory.org/2020/10/23/ful…
simons.berkeley.edu/talks/breaking…

Complex systems will have inherent contradictions and tensions that can never be resolved perfectly. The random tensions also makes them diverse. But this diversity forms universal patterns that are independent of whether we are talking spins, neurons, lasers, or problems.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Anders Sandberg

Try unrolling a thread yourself!

More from @anderssandberg

Anders Sandberg

Anders Sandberg

Anders Sandberg

Anders Sandberg

Anders Sandberg

Anders Sandberg

Did Thread Reader help you today?

Like this author's thread?