#FridayPhysicsFun – This year’s Nobel Prize in Physics went to Manabe, Hasselmann (climate) and Parisi. Media will focus on the climate stuff, because it is easy to explain. But what did Parisi do? And why does it matter for machine learning? nobelprize.org/prizes/physics…
A lot of this is way beyond me: I am not good enough at statistical mechanics to explain or use this directly. But I can see the shadows cast on the landscape by these results, and they are awesome. nobelprize.org/uploads/2021/1…
The start is spin glasses: solids where some atoms have spins that affect other spins, but in a disordered way (rather than the neat all parallel spins in ferromagnetic materials). They try to minimize their energy, but there will be frustration.
It would be nice to be able to calculate average properties that one can measure macroscopically (observables) of such materials, such as overall magnetization or heat capacity.
In standard statistical mechanics this is pretty easy: assume all states are possible each with probability declining exponentially with their energy, calculate the so-called partition function Z, and you can now calculate the observables. Ta-da! en.wikipedia.org/wiki/Boltzmann…
(OK, “easy” may involve some pretty tough integrals, sometimes dropping awesome Riemann zeta function values into the final expression. This is why the average number of bits of information per photon in blackbody radiation has the value below.) arxiv.org/abs/1511.01162
This runs into trouble with spin glasses since there are many possible energy minima and different samples will have entirely different configurations of minima because of local disorder. They all look the same if you squint, but calculating the partition function becomes hard.
This matters not just in spin glasses but in machine learning too: we want to calculate observables for the trained system (which minimizes error rather than physical energy) for random training data. It turns out that the mathematical problem is similar. boazbarak.org/Papers/replica…
The replica trick was the first step towards handling this. Basically you estimate the partition function by imagining N copies of the system, take the average over them to get a formula, and then let N approach zero.
en.wikipedia.org/wiki/Replica_t…
Yes, this is pretty nonsensical! We are making an integer number of things approach zero continuously. There are several more things here that make mathematicians cringe (0×0 matrices!) It is a trick, and why it works is still not 100% clear.
Still, it works… sometimes. One way it can fail is “replica symmetry breaking”, where the system is not ergodic (taking averages over time is not the same as averages across replicas).
Ergodicity is one of those important assumptions that make statistical mechanics work that nevertheless fail for many complex systems (notable examples: economics and evolution). nature.com/articles/s4156…
One way of thinking about the problem is that here it matters a lot if you take the average performance of a single run of a simulation or the average performance of many runs of the simulation. When replica symmetry breaks the replicas behave in different ways.
This is where Parisi did important work. He found that one could get around it by assuming that there are a lot (infinite) “order parameters” describing the system, describing an intricate hierarchy of energy minima.
journals.aps.org/prl/abstract/1…
journals.aps.org/prl/abstract/1…
Indeed, most complex systems have a near infinite variety of possible “ground states”. Convergence to them will also take vastly different times. arxiv.org/abs/cond-mat/0…
This is where I encountered his work when I did my PhD on neural networks. The good old Hopfield network, a simple model of associative memory in the brain, works just like a spin glass system. The number of memories that can be stored is determined by Parisi’s theory.
“Heating up” the network by adding noise or making the neurons respond less sharply merges nearby memory states hierarchically. This seems to be very much like how we abstract categories. And Parisi’s theory tells us about the structure. en.wikipedia.org/wiki/Ultrametr…
The methods developed in this field also apply to modern neural networks, allowing some mathematical grip on how they converge and behave. In particular, they allow calculating how the error changes as more training data is provided. journals.aps.org/prx/abstract/1…
They also show when thinking about constraint satisfaction problems: as you have more variables and more constraints on them, how many solutions are there? Are there nearby solutions? When does the problem typically become insoluble?
There are links to questions about how computationally hard it is to find minima of random polynomials
windowsontheory.org/2020/10/23/ful…
simons.berkeley.edu/talks/breaking…
Complex systems will have inherent contradictions and tensions that can never be resolved perfectly. The random tensions also makes them diverse. But this diversity forms universal patterns that are independent of whether we are talking spins, neurons, lasers, or problems.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Anders Sandberg

Anders Sandberg Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @anderssandberg

2 Oct
#FridayPhysicsFun – Stretching the definition of physics a fair bit, sigmoid growth curves are useful… except for predicting the future.
Sigmoids get the name from being S-shaped curves, taking the name from Greek letter sigma. They are also called logistic curves, ogives, s-curves, Gompertz curves, Bass curves, Verhulst growth...
en.wikipedia.org/wiki/Logistic_…
First, an initially accelerating growth period, leading up to a turning point. Then the growth slows and the curve tends towards an asymptote (or maximum/ saturation level). There are many formulas that give such curves.
Read 20 tweets
3 Sep
#FridayPhysicsFun – I am back home in the apartment where I grew up on the 11th floor. That is about 30 m down to the street, and as a kid I often considered the fate of toys dropped from the balcony. How does falling really work? en.wikipedia.org/wiki/Hagalund
The schoolbook answer is that the gravitational force F=mg accelerates the object as per Newton’s second law of motion F=ma and the falling object has an acceleration a=g because the mass factor cancels from both equations.
The velocity becomes v(t)=gt at time t, and the distance travelled d(t)=(1/2)gt^2. I remember kid-me inverting the later formula to t=sqrt(2h/g) and checking by dropping marbles that they took about 2.47 s to hit the ground. Fortunately nobody got hurt.
Read 17 tweets
3 Sep
Yet another rediscovery that simplified abstractions of neurons are simpler than the real thing! quantamagazine.org/how-computatio… To be fair, Beniaguev, Segev & London have a neat way of quantifying it using a kind of circuit complexity: doi.org/10.1016/j.neur…
IMHO the coolest result is that the NMDA receptors contribute a lot of the complexity in biological neurons: leave them out, and things simplify a lot. They are well placed to change properties deeply based on experience.
On the other hand, the fact that even ReLU-sum-of-weighted-input artificial neurons are not just computationally universal but actually work really well for real applications hint that maybe complex neurons are overrated.
Read 4 tweets
21 Aug
VQGAN+CLIP: "Moominvalley by Tove Jansson" (+trending on ArtStation)
+"rendered in Maya", +"Line art"
+"Tom of Finland" (the ultimate Finnish LGBTQ collaboration?) I really love the trees in the background.
Read 7 tweets
20 Aug
#FridayPhysicsFun – Normal crystals consist of atoms or molecules arranged in a regular lattice. Recently there has been experimental demonstrations of 2D Wigner crystals – crystals made of just electrons.
quantamagazine.org/physicists-cre…
The idea is pretty old: Eugene Wigner proposed in 1934 that electrons would repel each other and if the density was low enough form a lattice. The repulsion dominates over the kinetic energy and makes it “solid”.
en.wikipedia.org/wiki/Wigner_cr…
Too high density and they “quantum melt” as the kinetic energy dominates and the lattice dissolves. Too high temperature and they melt normally because of thermal vibration. 3D Wigner crystals need a lower density than 2D crystals to solidify.
Read 12 tweets
6 Aug
This paper from @CSERCambridge is a great example of systems thinking in GCRs: looking for pinch points where global infrastructure concentrates near natural hazards. nature.com/articles/s4146…
Many things get drawn close to hazards: Teheran is on a fault line that provides good water, container ports on cheap flat land close to the sea vulnerable to storm surge and sea rise, people live in Florida because weather that also enables hurricanes.
Good geothermal and cooling are drawing data centers to Iceland. The Mediterranean and Bay Area complex geology make them attractive but geologically "exciting". en.wikipedia.org/wiki/Marsili
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(