Tweet

AI Pub

Sep 28 • 19 tweets • 7 min read

// Git Re-Basin, Explained (Part I) //

Two weeks ago, researchers discovered a way to "merge" ML models, trained on different datasets, at *no* cost to loss!

They also found that that NN loss landscapes effectively contain a single basin.

Why, and how?

Read below:

1/19

The Git Re-Basin paper has two parts:

Part I is about symmetries of neural networks, and how to "align" the weights of two NNs with these symmetries.

Part II shows how to "merge" two models once the weights are aligned, and the limits and implications of merging.

2/19

The starting observation for Git Re-Basin is that neural nets have an *enormous* number of redundant symmetries.

Consider a neural net with a hidden layer consisting of two neurons, A and B.

3/19

If you "swap A with B",

I.e., swap the weights going in and out of A with those going in and out of B,

You get a different neural network - but one that computes the exact same function!

This network with two neurons in the hidden layer has 2! redundant symmetries.

4/19

More generally, an n-neuron hidden layer will exhibit n! symmetries.

So by permuting the weights, a given NN has an astronomical number of equivalent descriptions.

Even a shallow multilayer perceptron has far more of these symmetries than there are atoms in the universe!

5/19

Why does this matter?

If you train a neural net twice with different random seeds, you'll converge to two different weights - W1, and W2.

If you look at W1 and W2 as lists of numbers, they'll look very different.

6/19

But what if they're "the same" weights, just permuted? What if they describe "the same" neural net?

And if they were "the same" weights, how could you tell?

That's Part I of Git Re-Basin!

7/19

In the paper, the authors introduce three methods to bring the weights of two NNs of the same architecture "into alignment" by permuting weights.

These are:
1) Activation matching
2) Weight matching
3) Straight-through estimator

8/19

They find 2), "weight matching", to be accurate enough for their purposes, and note it runs faster than the other methods by orders of magnitude - only a couple seconds on modern hardware.

So I'll go over that one below - read the paper for the others!

9/19

Take two ML models of the same architecture with different weights, W_A, and W_B.

We want to permute the weights on B until W_B is closest to W_A in weight space.

After expanding out terms, we get equivalence with maximizing a sum of cosine similarity terms:

10/19

(The P^T terms show up since, for every permutation you do on *one* side of a hidden layer, you have to undo it on the *other* side).

Unfortunately, finding permutation matrices P_i that maximize the L-term sum above is an NP-hard problem.

11/19

But what if we proceed in a greedy fashion, permuting one layer at a time?

In that case, all but two terms in the sum are constant - and we can transform it into a "linear assignment problem" for which practical, polynomial-time algorithms exist.

12/19

So that's method 2, weight matching!

Greedily advance from the first layers to the last, permuting weights to solve the linear assignment problem specified by a sum of two matrix products.

Algorithm is OOMs faster than the others; runs in seconds on modern hardware.

13/19

What do you get by running this process?

You get two ML models whose weights are "aligned".

Recall that two models can be functionally equivalent, but have very different weights due to symmetries in weight space.

Git Re-Basin undoes these symmetries to "align" models.

14/19

The real fun kicks in after the models are aligned in weight space, and you can perform operations on them.

That's "merging" the models, the main point of the Git Re-Basin paper.

Will cover that in a separate thread in two days!

15/19

To recap Part I:

1) Wide NNs have an insane number of symmetries
2) Therefore ML models can converge to different, but functionally equivalent solutions in weight space
3) Authors find fast, greedy algorithm to "align" two ML models in weight space by permuting layers

16/19

https://twitter.com/ai__pub/status/1574824836446109696

A bit about AI Pub:

Last week we launched a talent network to get engineers hired at the best AI companies. 40 members now!

If you're a software engineer, ML engineer, or ML scientist with 2+ YOE, join here: aipub.pallet.com/talent/welcome…

How we select companies, below:

17/19

https://twitter.com/ai__pub/status/1574824836446109696

https://twitter.com/ai__pub/status/1567979703868465154

We also publish regular "explainer" and "paper-walkthrough" threads like the one you just read.

Here's one on scaling laws and DeepMind's famous Chinchilla paper from a couple weeks ago.

Until next time! 👋

18/19

https://twitter.com/ai__pub/status/1567979703868465154

https://twitter.com/SamuelAinsworth/status/1569719494645526529

Last of all: read the Git Re-Basin paper here!

Paper: arxiv.org/abs/2209.04836
Code: github.com/samuela/git-re…
Twitter thread by authors (below):

19/19

https://twitter.com/SamuelAinsworth/status/1569719494645526529

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ai__pub

AI Pub

@ai__pub

Sep 28

https://twitter.com/tszzl/status/1575022026191839234

// LeCun's 3-Layer 🍰, AI Outmoding Humans 🤔 //

A useful metaphor here is Yann LeCun's "3-layer cake". (medium.com/syncedreview/y…)

The former, eg. self-driving, is a product that replaces all layers of the cake, including the "cherry" RL agent at the top: the human driver.

1/10

https://twitter.com/tszzl/status/1575022026191839234

The latter, e.g. Copilot, is a product that only replaces the first two layers of the cake (foundation model + fine-tuning), while leaving the RL agent on the top - the human programmer - intact.

Replacing the whole cake is much harder than just replacing a lower layer.

2/10

Off-brand commentary for the AI Pub channel, but:

I'm coming to believe that AI will outmode humans in the economy via a process the resembles the "cake" being eaten up, from bottom-to-top.

It started with software in the second half of the 20th century, eating up...

3/10

Read 10 tweets

AI Pub

@ai__pub

Sep 27

Reason #2 to join the AI Pub talent network:

*Company curation.*

Some notes on:
- Why we've rejected the vast majority of companies who've applied to hire from us,
- Our bar for hiring companies,
- Why finding great AI companies is hard

below:

1/10

(Before getting into everything, here is our talent network. We launched last week and have ~30 engineers on board.)

If you're a software engineer, ML engineer, or ML researcher with 2+ years of experience, apply to join here: aipub.pallet.com/talent/welcome…

2/10

1) Rejection + growing slowly

Since launching last week, we've only onboarded a few companies and rejected 10-15 who have reached out to hire from the network.

We're building a very high signal, low-noise place where great engineers connect with great companies.

3/10

Read 10 tweets

AI Pub

@ai__pub

Sep 25

Best of AI Twitter (Sept. 18-25):

- OpenAI releases Whisper (human-level speech-to-text)
- Adversarial, interactive deepfakes
- Twitter "gossip" on how OpenAI gathers GPT-4's trillions of training tokens
- The guts of Google's multibillion-parameter ad model

... and more:

1/15

https://twitter.com/openai/status/1572629923017400326

OpenAI releases Whisper, a state-of-the-art speech-to-text model that by all accounts transcribes at a human level - if not better.

See Whisper perfectly transcribe a technical lecture on NLP in the photo. (Source: Andrej Karpathy)

2/15

https://twitter.com/openai/status/1572629923017400326

https://twitter.com/ai__pub/status/1573755508523032576

Andrej Karpathy breaks down the Whisper paper.

Does a great job distilling the paper's key ideas in a 10-tweet thread. Give it a read!

(There were some other great paper walkthroughs on AI Twitter and AI Youtube. We've collected a few here:

https://twitter.com/ai__pub/status/1573755508523032576

)

3/15

https://twitter.com/karpathy/status/1573019730851397632

Read 15 tweets

AI Pub

@ai__pub

Sep 24

https://twitter.com/openai/status/1572629923017400326

Since OpenAI released Whisper a three days ago, there have been some outstanding paper + code walkthroughs on AI Twitter + AI YouTube.

A few below if you haven't seen them:

1/4

https://twitter.com/openai/status/1572629923017400326

https://twitter.com/karpathy/status/1573019730851397632

Andrej Karpathy paper walkthrough:

(Does a great job distilling and sharing the paper's key ideas)

2/4

https://twitter.com/karpathy/status/1573019730851397632

https://twitter.com/iScienceLuvr/status/1572675724343152641

Thread on architecture and model performance by Tanishq Abraham:

https://twitter.com/iScienceLuvr/status/1572675724343152641

3/4

Read 4 tweets

AI Pub

@ai__pub

Sep 18

Best of AI Twitter (Sept. 11-18):

- LLMs learn to use software and execute code 😅,
- Git Re-Basin: a technique to "merge" deep NN models,
- Meta spins off an independent PyTorch Foundation,
- Stalker tools, courtesy of computer vision + CCTV,

... and more:

1/13

https://twitter.com/AdeptAILabs/status/1570144499187453952

Adept AI releases ACT-1, a model that can use arbitrary software tools. It can:

- Find you a house on Zillow
- Search for a fridge on craigslist and email the seller
- Use Excel and Salesforce like a professional

I found this personally terrifying 😅 ... see thread.

2/13

https://twitter.com/AdeptAILabs/status/1570144499187453952

https://twitter.com/SamuelAinsworth/status/1569719494645526529

Git Re-Basin. It turns out:

1) Loss landscapes of wide NNs effectively have only one basin
2) Authors provide "Git Re-Basin" algorithm to "merge" models trained on different data, at *no cost* to loss, in weight space

Breakthrough in federated learning. See the thread!

3/13

https://twitter.com/SamuelAinsworth/status/1569719494645526529

Read 13 tweets

AI Pub

@ai__pub

Sep 16

// Early access: AI Pub Talent Community //

We're building a talent community to connect experienced
- software engineers,
- ML engineers,
- and ML researchers
... with the best companies in AI.

Launching Monday!

But get early access now - more info:

1/7

Join and get connection requests from the best companies in AI.

Show or hide yourself from companies, accept or reject connection requests - all on your own terms.

You can make your profile anonymous so companies see your work experience but no identifying info.

2/7

Community is open to senior engineers and researchers who meet the bar. If you meet the bar, we'll accept your application!

Community has three different types of members (SWEs, MLEs, and ML scientists), with a different bar for each.

Here they are:

3/7

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Separate emails with commas Message

Share this page!

AI Pub

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @ai__pub

AI Pub

AI Pub

AI Pub

AI Pub

AI Pub

AI Pub

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!