AI Pub Profile picture
Sep 28 19 tweets 7 min read
// Git Re-Basin, Explained (Part I) //

Two weeks ago, researchers discovered a way to "merge" ML models, trained on different datasets, at *no* cost to loss!

They also found that that NN loss landscapes effectively contain a single basin.

Why, and how?

Read below:

1/19
The Git Re-Basin paper has two parts:

Part I is about symmetries of neural networks, and how to "align" the weights of two NNs with these symmetries.

Part II shows how to "merge" two models once the weights are aligned, and the limits and implications of merging.

2/19
The starting observation for Git Re-Basin is that neural nets have an *enormous* number of redundant symmetries.

Consider a neural net with a hidden layer consisting of two neurons, A and B.

3/19
If you "swap A with B",

I.e., swap the weights going in and out of A with those going in and out of B,

You get a different neural network - but one that computes the exact same function!

This network with two neurons in the hidden layer has 2! redundant symmetries.

4/19
More generally, an n-neuron hidden layer will exhibit n! symmetries.

So by permuting the weights, a given NN has an astronomical number of equivalent descriptions.

Even a shallow multilayer perceptron has far more of these symmetries than there are atoms in the universe!

5/19
Why does this matter?

If you train a neural net twice with different random seeds, you'll converge to two different weights - W1, and W2.

If you look at W1 and W2 as lists of numbers, they'll look very different.

6/19
But what if they're "the same" weights, just permuted? What if they describe "the same" neural net?

And if they were "the same" weights, how could you tell?

That's Part I of Git Re-Basin!

7/19
In the paper, the authors introduce three methods to bring the weights of two NNs of the same architecture "into alignment" by permuting weights.

These are:
1) Activation matching
2) Weight matching
3) Straight-through estimator

8/19
They find 2), "weight matching", to be accurate enough for their purposes, and note it runs faster than the other methods by orders of magnitude - only a couple seconds on modern hardware.

So I'll go over that one below - read the paper for the others!

9/19
Take two ML models of the same architecture with different weights, W_A, and W_B.

We want to permute the weights on B until W_B is closest to W_A in weight space.

After expanding out terms, we get equivalence with maximizing a sum of cosine similarity terms:

10/19
(The P^T terms show up since, for every permutation you do on *one* side of a hidden layer, you have to undo it on the *other* side).

Unfortunately, finding permutation matrices P_i that maximize the L-term sum above is an NP-hard problem.

11/19
But what if we proceed in a greedy fashion, permuting one layer at a time?

In that case, all but two terms in the sum are constant - and we can transform it into a "linear assignment problem" for which practical, polynomial-time algorithms exist.

12/19
So that's method 2, weight matching!

Greedily advance from the first layers to the last, permuting weights to solve the linear assignment problem specified by a sum of two matrix products.

Algorithm is OOMs faster than the others; runs in seconds on modern hardware.

13/19
What do you get by running this process?

You get two ML models whose weights are "aligned".

Recall that two models can be functionally equivalent, but have very different weights due to symmetries in weight space.

Git Re-Basin undoes these symmetries to "align" models.

14/19
The real fun kicks in after the models are aligned in weight space, and you can perform operations on them.

That's "merging" the models, the main point of the Git Re-Basin paper.

Will cover that in a separate thread in two days!

15/19
To recap Part I:

1) Wide NNs have an insane number of symmetries
2) Therefore ML models can converge to different, but functionally equivalent solutions in weight space
3) Authors find fast, greedy algorithm to "align" two ML models in weight space by permuting layers

16/19
A bit about AI Pub:

Last week we launched a talent network to get engineers hired at the best AI companies. 40 members now!

If you're a software engineer, ML engineer, or ML scientist with 2+ YOE, join here: aipub.pallet.com/talent/welcome…

How we select companies, below:

17/19
We also publish regular "explainer" and "paper-walkthrough" threads like the one you just read.

Here's one on scaling laws and DeepMind's famous Chinchilla paper from a couple weeks ago.

Until next time! 👋

18/19
Last of all: read the Git Re-Basin paper here!

Paper: arxiv.org/abs/2209.04836
Code: github.com/samuela/git-re…
Twitter thread by authors (below):

19/19

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with AI Pub

AI Pub Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ai__pub

Sep 28
// LeCun's 3-Layer 🍰, AI Outmoding Humans 🤔 //

A useful metaphor here is Yann LeCun's "3-layer cake". (medium.com/syncedreview/y…)

The former, eg. self-driving, is a product that replaces all layers of the cake, including the "cherry" RL agent at the top: the human driver.

1/10
The latter, e.g. Copilot, is a product that only replaces the first two layers of the cake (foundation model + fine-tuning), while leaving the RL agent on the top - the human programmer - intact.

Replacing the whole cake is much harder than just replacing a lower layer.

2/10
Off-brand commentary for the AI Pub channel, but:

I'm coming to believe that AI will outmode humans in the economy via a process the resembles the "cake" being eaten up, from bottom-to-top.

It started with software in the second half of the 20th century, eating up...

3/10
Read 10 tweets
Sep 27
Reason #2 to join the AI Pub talent network:

*Company curation.*

Some notes on:
- Why we've rejected the vast majority of companies who've applied to hire from us,
- Our bar for hiring companies,
- Why finding great AI companies is hard

below:

1/10 ImageImageImageImage
(Before getting into everything, here is our talent network. We launched last week and have ~30 engineers on board.)

If you're a software engineer, ML engineer, or ML researcher with 2+ years of experience, apply to join here: aipub.pallet.com/talent/welcome…

2/10
1) Rejection + growing slowly

Since launching last week, we've only onboarded a few companies and rejected 10-15 who have reached out to hire from the network.

We're building a very high signal, low-noise place where great engineers connect with great companies.

3/10
Read 10 tweets
Sep 25
Best of AI Twitter (Sept. 18-25):

- OpenAI releases Whisper (human-level speech-to-text)
- Adversarial, interactive deepfakes
- Twitter "gossip" on how OpenAI gathers GPT-4's trillions of training tokens
- The guts of Google's multibillion-parameter ad model

... and more:

1/15
OpenAI releases Whisper, a state-of-the-art speech-to-text model that by all accounts transcribes at a human level - if not better.

See Whisper perfectly transcribe a technical lecture on NLP in the photo. (Source: Andrej Karpathy)

2/15
Andrej Karpathy breaks down the Whisper paper.

Does a great job distilling the paper's key ideas in a 10-tweet thread. Give it a read!

(There were some other great paper walkthroughs on AI Twitter and AI Youtube. We've collected a few here: )

3/15
Read 15 tweets
Sep 24
Since OpenAI released Whisper a three days ago, there have been some outstanding paper + code walkthroughs on AI Twitter + AI YouTube.

A few below if you haven't seen them:

1/4
Andrej Karpathy paper walkthrough:

(Does a great job distilling and sharing the paper's key ideas)

2/4
Thread on architecture and model performance by Tanishq Abraham:



3/4
Read 4 tweets
Sep 18
Best of AI Twitter (Sept. 11-18):

- LLMs learn to use software and execute code 😅,
- Git Re-Basin: a technique to "merge" deep NN models,
- Meta spins off an independent PyTorch Foundation,
- Stalker tools, courtesy of computer vision + CCTV,

... and more:

1/13 ImageImageImageImage
Adept AI releases ACT-1, a model that can use arbitrary software tools. It can:

- Find you a house on Zillow
- Search for a fridge on craigslist and email the seller
- Use Excel and Salesforce like a professional

I found this personally terrifying 😅 ... see thread.

2/13
Git Re-Basin. It turns out:

1) Loss landscapes of wide NNs effectively have only one basin
2) Authors provide "Git Re-Basin" algorithm to "merge" models trained on different data, at *no cost* to loss, in weight space

Breakthrough in federated learning. See the thread!

3/13
Read 13 tweets
Sep 16
// Early access: AI Pub Talent Community //

We're building a talent community to connect experienced
- software engineers,
- ML engineers,
- and ML researchers
... with the best companies in AI.

Launching Monday!

But get early access now - more info:

1/7 Image
Join and get connection requests from the best companies in AI.

Show or hide yourself from companies, accept or reject connection requests - all on your own terms.

You can make your profile anonymous so companies see your work experience but no identifying info.

2/7
Community is open to senior engineers and researchers who meet the bar. If you meet the bar, we'll accept your application!

Community has three different types of members (SWEs, MLEs, and ML scientists), with a different bar for each.

Here they are:

3/7
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(