Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Samuel Ainsworth is in NYC

Sep 13, 2022 • 13 tweets • 5 min read • Read on X

📜🚨📜🚨
NN loss landscapes are full of permutation symmetries, ie. swap any 2 units in a hidden layer. What does this mean for SGD? Is this practically useful?

For the past 5 yrs these Qs have fascinated me. Today, I am ready to announce "Git Re-Basin"!

arxiv.org/abs/2209.04836

We show that NN loss landscapes contain effectively only a single basin(!) provided sufficient width. Even better, we develop practical algos to navigate these basins...

Say you train Model A.

Independently, your friend trains Model B, possibly on different data.

With Git Re-Basin, you can merge models A+B in weight space at _no cost to the loss_

Git Re-Basin applies to any NN arch & we provide the first-ever demonstration of zero-barrier linear mode connectivity between two independently trained (no pre-training!) ResNets.

Put simply: a ResNet loss landscape contains only a single basin & we have algo to prove it

Phenomenon #1: "merge-ability" is an emergent property of SGD training -> merging at init doesn't work but a phase transition occurs such that it becomes possible over time

Phenomenon #2: Model width is intimately related to merge-ability: the wider the better. Not too burdensome of a constraint since we're all training in the wide/overparameterized regime anyways. Important nonetheless...

Also, not all arch's are equally mergeable: VGGs seem to be harder than ResNets 🤷‍♂️ We hypothesize that merge-ability is an indicator of compatible data/arch fit.

Finally, my fav result: it's possible to train models on disjoint and biased datasets, then merge them together in weight space.

Eg, you have some data in US, some in EU. Can't mix data due to GDPR etc. Train separate models, merge weights -> generalize to the combined dataset!

So there ya go: it's possible to mix trained models like mixing potions, no pre-training or fine-tuning necessary.

That said, there are still loads of open questions left! I'm v curious to see where LMC and model patching work goes in the future 🚀

Also plenty of exciting possible applications to federated learning, distributed training, deep learning optimization, and so forth

Ok, that's enough for one thread... Check out algos, counterexamples, proofs, and more in

the paper (arxiv.org/abs/2209.04836)
and code (github.com/samuela/git-re…)

@siddhss5

Joint work with Jonathan Hayase and @siddhss5. Inspired by work from @colinraffel, @rahiment, @jefrankle, @RAIVNLab folks, and many other beautiful people!

Shout out to @Mitchnw, @adityakusupati, @RamanujanVivek and others who came along the ride!

Oh I forgot to add: our weight matching algo (sec 3.2) runs in ~10 seconds. So you won't be waiting around all day!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SamuelAinsworth

Samuel Ainsworth is in NYC

@SamuelAinsworth

Jan 9, 2023

@Microsoft

Prediction: @Microsoft will launch an AI assistant product in the next 5 years, built on ChatGPT. It will blow Google Assistant, Amazon Alexa, etc out of the water 🌊

Think about it... 1/n

@AdeptAILabs

Imagine @AdeptAILabs's demo video (

https://twitter.com/AdeptAILabs/status/1570144499187453952

) but with ChatGPT and working with any app running on Windows...

MSFT already has the necessary puzzle pieces in place:
1. Exclusive access to ChatGPT/related tech, thanks to their close partnership with @OpenAI

@Azure

2. Enough cloud infra and capital to support running ML models for millions of users (@Azure)
3. A rich app and developer ecosystem that they control top to bottom (.NET, Windows Dev ecosys.)
4. Hardware chops (from Surface, etc) matching or exceeding G Nest, Alexa, and the rest

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Samuel Ainsworth is in NYC

Try unrolling a thread yourself!

More from @SamuelAinsworth

Samuel Ainsworth is in NYC

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!