They find 2), "weight matching", to be accurate enough for their purposes, and note it runs faster than the other methods by orders of magnitude - only a couple seconds on modern hardware.
So I'll go over that one below - read the paper for the others!
9/19
Take two ML models of the same architecture with different weights, W_A, and W_B.
We want to permute the weights on B until W_B is closest to W_A in weight space.
After expanding out terms, we get equivalence with maximizing a sum of cosine similarity terms:
10/19
(The P^T terms show up since, for every permutation you do on *one* side of a hidden layer, you have to undo it on the *other* side).
Unfortunately, finding permutation matrices P_i that maximize the L-term sum above is an NP-hard problem.
11/19
But what if we proceed in a greedy fashion, permuting one layer at a time?
In that case, all but two terms in the sum are constant - and we can transform it into a "linear assignment problem" for which practical, polynomial-time algorithms exist.
12/19
So that's method 2, weight matching!
Greedily advance from the first layers to the last, permuting weights to solve the linear assignment problem specified by a sum of two matrix products.
Algorithm is OOMs faster than the others; runs in seconds on modern hardware.
13/19
What do you get by running this process?
You get two ML models whose weights are "aligned".
Recall that two models can be functionally equivalent, but have very different weights due to symmetries in weight space.
Git Re-Basin undoes these symmetries to "align" models.
14/19
The real fun kicks in after the models are aligned in weight space, and you can perform operations on them.
That's "merging" the models, the main point of the Git Re-Basin paper.
Will cover that in a separate thread in two days!
15/19
To recap Part I:
1) Wide NNs have an insane number of symmetries 2) Therefore ML models can converge to different, but functionally equivalent solutions in weight space 3) Authors find fast, greedy algorithm to "align" two ML models in weight space by permuting layers
16/19
A bit about AI Pub:
Last week we launched a talent network to get engineers hired at the best AI companies. 40 members now!
I help ~25 AI startups recruit top-notch engineers, via the AI Pub Talent Network:
Now helping some with their hiring processes.
ML and software engineers: you're invited to interview. Why do you *not* start the hiring process with a company?
1/2
Some reasons that come to mind:
- Not ready / not the right time to leave current role
- Hiring process is long / a PITA
- Cash or equity comp not transparent
- Comp not high enough
- Product, company, or team isn't compelling
Any others?
2/2
Three others that come to mind:
- Don’t want to relocate
- Company isn’t prestigious enough
- Don’t think they’ll pass the interview or get hired (eg I’m not applying for a job at OpenAI b/c it’d be a waste of time)
3/2
Harvey is an OpenAI-backed GPT-4 startup building AI knowledge workers.
They've signed deals with the largest law firms on earth, and are the fastest-growing LLM startup by revenue I know of.
Everything you need to know about Harvey:
1/10
Harvey's first product is a GPT-4 powered AI knowledge worker.
Harvey can:
- Generate long-form legal documents
- With niche knowledge of the law
- Answer complex legal questions
- Leveraging millions of documents
- Create firm-specific models
2/10
In the last two months, Harvey rolled out multi-million dollar contracts with the largest law firms in the world.
With early access to next-gen text models from OpenAI (😉), Harvey can:
- Answer complex legal questions
- Leveraging millions of documents
- Generate unique work product
- With knowledge of niche law
- Learn from lawyer feedback
- Create firm-specific models