Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Yann Dubois

@yanndubs

Nov 28, 2022 • 11 tweets • 10 min read • Read on X

Scrolly

@tatsu_hashimoto

#NeurIPS2022
What are ideal representations for self-sup. learning (SSL)?

🤓We give simple optimality conditions and use them to improve/understand/derive SSL methods!

🔥outperform baselines on ImageNet

arxiv.org/abs/2011.10566
w. @tatsu_hashimoto @StefanoErmon @percyliang
🧵

Goal: ideally representations should allow linear probes to perfectly predict any task that is invariant to augmentations in the most sample-efficient way

Q: Which of the following representation is optimal?

2/8

A: last one.

More generally we show that representations are optimal if and only if:
1. *Predictability*: linear probes can predict equivalence classes
2. *High dimension*: representation dim d=# equiv-1
3. *Invariance*: representation of equivalent examples collapse

3/8

Key: ideal SSL = supervised classification from high dim. space to equiv. classes using probing architecture

This leads to a unifying SSL framework (contrastive or not) with actionable insights eg how to
- choose projection heads
- choose dim.
- simplify non-contrast. SSL
4/8

**Dimension**

We just showed that the dimensionality of representation should ideally be number of equivalence classes => much larger than currently

Smartly increasing dimension has a huge impact on performance without increasing parameters!!

≥ 2% acc gains on ImageNet
5/8

**Projection heads**

Current SSL uses 2 siamese networks with MLP projection heads

We prove that one head should be linear

Intuition: representations should be pretrained as they will be used downstream.
linear probing => one linear projection

This gives ≥ 1% acc gains
6/8

**Non-contrastive SSL**

We show that most prior non-contrastive objectives are approximations of optimal SSL

We provide DISSL: a much simpler objective (no stop-gradients / no EMA / no Sinkhorn) that better approximates optimal SSL

DISSL outperforms SwAV/DINO
7/8

Other actionable insights in the paper eg:
- how to perform SSL for non-linear probes
- choosing augmentations

If you are at #NeurIPS2022 come to our poster Hall J #905 tomorrow 4-6pm

Code and pretrained ImageNet models: github.com/YannDubs/Invar…
8/8

@douwekiela

Many ideas come from prior work with great collaborators
-ideal supervised repr. arxiv.org/abs/2201.00057
-ideal robust repr. arxiv.org/abs/2201.00057
-invariance&compression arxiv.org/abs/2106.10800
@douwekiela @davidjschwab @rama_vedantam @YangjunR @cjmaddison Ben @karen_ullrich

@ananyaku

Grateful for all discussions/feedback on SSL and visualizations from:
@ananyaku @shengjia_zhao @rtaori13 @mo_tiwari @sangmichaelxie @niladrichat @ShibaniSan @baaadas @chenlin_meng @MayeeChen @AlexTamkin @YangjunR @malikrali @jhaochenz @RishiBommasani @kaylburns @manim_community

**Edit** correct link is arxiv.org/abs/2209.06235

That’s the problem when you have too many arxiv tabs open 😅

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @yanndubs

Yann Dubois

@yanndubs

Mar 20, 2024

AlpacaEval is now length-controlled (LC)!

✅ highest correlation with Chat Arena (0.98)
✅ no reannotation
✅ simple interpretation: win rate if model length = baseline length
✅ robust to length gamification

0.98 that’s essentially evaluation on Arena but in 3min and <$10.

Key: predict what the win rate would be if model length=baseline length

We:
1. fit GLM: model | length | instruction -> preference
2. predict preference conditioned on baseline length

Benefits:
✅easily extendible to other biases
✅nice math properties
✅no reannotation needed

Length-controlled AE is much more robust to verbosity gameability.

Below we show the metrics change when you prompt models to:
- “give as much detail as possible” (verbose) or
- “be as concise as possible [...]” (concise)

Read 8 tweets

Yann Dubois

@yanndubs

Jan 6, 2023

@AnthropicAI

I played with @AnthropicAI assistant (AA) and compared it to @OpenAI ChatGPT

TLDR: both are similar but AA is
+ Harder to jailbreak
+ Tries to be more helpful
+ Follows closer what we ask for
+ ~Better for writing in English
- Worst for code
- Worst in French
- Longer resp.
🧵

**Coding**
CGPT is better

Quantitative (leetcode hard in python/c/javascript):
- CGPT: 3/3
- AA: 1/3 only got python correct

Qualitative:
- CGPT: more reliable/concise/efficient
- AA: more comments + emphasizes explainability

both are wrong when asked for impossible algo
2/8

**Writing**
Both are similar but AA generally follows closer what it's asked for. But AA is less concise as it explains what it says and asks how it can help, which can be annoying as it takes more time to generate.

Here’s an example to write a short essay
3/8

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Yann Dubois

Try unrolling a thread yourself!

More from @yanndubs

Yann Dubois

Yann Dubois

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!