They find 2), "weight matching", to be accurate enough for their purposes, and note it runs faster than the other methods by orders of magnitude - only a couple seconds on modern hardware.
So I'll go over that one below - read the paper for the others!
9/19
Take two ML models of the same architecture with different weights, W_A, and W_B.
We want to permute the weights on B until W_B is closest to W_A in weight space.
After expanding out terms, we get equivalence with maximizing a sum of cosine similarity terms:
10/19
(The P^T terms show up since, for every permutation you do on *one* side of a hidden layer, you have to undo it on the *other* side).
Unfortunately, finding permutation matrices P_i that maximize the L-term sum above is an NP-hard problem.
11/19
But what if we proceed in a greedy fashion, permuting one layer at a time?
In that case, all but two terms in the sum are constant - and we can transform it into a "linear assignment problem" for which practical, polynomial-time algorithms exist.
12/19
So that's method 2, weight matching!
Greedily advance from the first layers to the last, permuting weights to solve the linear assignment problem specified by a sum of two matrix products.
Algorithm is OOMs faster than the others; runs in seconds on modern hardware.
13/19
What do you get by running this process?
You get two ML models whose weights are "aligned".
Recall that two models can be functionally equivalent, but have very different weights due to symmetries in weight space.
Git Re-Basin undoes these symmetries to "align" models.
14/19
The real fun kicks in after the models are aligned in weight space, and you can perform operations on them.
That's "merging" the models, the main point of the Git Re-Basin paper.
Will cover that in a separate thread in two days!
15/19
To recap Part I:
1) Wide NNs have an insane number of symmetries 2) Therefore ML models can converge to different, but functionally equivalent solutions in weight space 3) Authors find fast, greedy algorithm to "align" two ML models in weight space by permuting layers
16/19
A bit about AI Pub:
Last week we launched a talent network to get engineers hired at the best AI companies. 40 members now!
The latter, e.g. Copilot, is a product that only replaces the first two layers of the cake (foundation model + fine-tuning), while leaving the RL agent on the top - the human programmer - intact.
Replacing the whole cake is much harder than just replacing a lower layer.
2/10
Off-brand commentary for the AI Pub channel, but:
I'm coming to believe that AI will outmode humans in the economy via a process the resembles the "cake" being eaten up, from bottom-to-top.
It started with software in the second half of the 20th century, eating up...
Some notes on:
- Why we've rejected the vast majority of companies who've applied to hire from us,
- Our bar for hiring companies,
- Why finding great AI companies is hard
below:
1/10
(Before getting into everything, here is our talent network. We launched last week and have ~30 engineers on board.)
If you're a software engineer, ML engineer, or ML researcher with 2+ years of experience, apply to join here: aipub.pallet.com/talent/welcome…
2/10
1) Rejection + growing slowly
Since launching last week, we've only onboarded a few companies and rejected 10-15 who have reached out to hire from the network.
We're building a very high signal, low-noise place where great engineers connect with great companies.
- OpenAI releases Whisper (human-level speech-to-text)
- Adversarial, interactive deepfakes
- Twitter "gossip" on how OpenAI gathers GPT-4's trillions of training tokens
- The guts of Google's multibillion-parameter ad model
... and more:
1/15
OpenAI releases Whisper, a state-of-the-art speech-to-text model that by all accounts transcribes at a human level - if not better.
See Whisper perfectly transcribe a technical lecture on NLP in the photo. (Source: Andrej Karpathy)
- LLMs learn to use software and execute code 😅,
- Git Re-Basin: a technique to "merge" deep NN models,
- Meta spins off an independent PyTorch Foundation,
- Stalker tools, courtesy of computer vision + CCTV,
... and more:
1/13
Adept AI releases ACT-1, a model that can use arbitrary software tools. It can:
- Find you a house on Zillow
- Search for a fridge on craigslist and email the seller
- Use Excel and Salesforce like a professional
I found this personally terrifying 😅 ... see thread.
1) Loss landscapes of wide NNs effectively have only one basin 2) Authors provide "Git Re-Basin" algorithm to "merge" models trained on different data, at *no cost* to loss, in weight space
Breakthrough in federated learning. See the thread!