Q. What does Noether’s theorem tell us about the “geometry of deep learning dynamics”?
A. We derive Noether’s Learning Dynamics and show:
”SGD+momentum+BatchNorm+weight decay” = “RMSProp" due to symmetry breaking!

w/ @KuninDaniel
#NeurIPS2021 Paper: bit.ly/3pAEYdk
1/
@KuninDaniel Geometry of data & representations has been central in the design of modern deepnets.
e.g., #GeometricDeepLearning arxiv.org/abs/2104.13478 by @mmbronstein, @joanbruna, @TacoCohen, @PetarV_93

What are the geometric design principles for “learning dynamics in parameter space”?
2/
We develop Lagrangian mechanics of learning by modeling it as the motion of a particle in high-dimensional parameter space. Just like physical dynamics, we can model the trajectory of discrete learning dynamics by continuous-time differential equations.
3/
However, there is still a gap between Newton’s EOM and gradient flow. Thus, we model the effects of finite learning rate as “implicit acceleration”, a complementary route to the "implicit gradient regularization" by @dgtbarrett, Benoit Dherin, @SamuelMLSmith, @sohamde_.
4/
As a result, gradient descent becomes Lagrangian dynamics with a finite learning rate, where the learning rule corresponds to the kinetic energy and the loss function corresponds to the potential energy.
5/
Symmetry properties of this Lagrangian govern the geometry of learning dynamics.
Indeed, modern deep learning architectures introduce an array of symmetries to the loss function as we previously studied in arxiv.org/abs/2012.04728.
6/
By studying the symmetry properties of the kinetic energy, we define “kinetic symmetry breaking”, where the kinetic energy corresponding to the learning rule explicitly breaks the symmetry of the potential energy corresponding to the loss function.
7/
We derive Noether’s Learning Dynamics (NLD), unified equality that holds for any combination of symmetry and learning rules. NLD accounts for damping, the unique symmetries of the loss, and the non-Euclidean metric used in learning rules.
8/
We establish an exact analogy between two seemingly unrelated components of modern deep learning: normalization and adaptive optimization.
Benefits of this broken-symmetry-induced “implicit adaptive optimization” are all empirically confirmed!
9/
Overall, understanding not only when symmetries exist, but how they are broken is essential to discover geometric design principles in neural networks.

For more details see
“Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks": bit.ly/3pAEYdk
10/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Hidenori Tanaka

Hidenori Tanaka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(