Tweet

Yann LeCun

Follow @ylecun

8 May, 14 tweets, 3 min read

Barlow Twins: a new super-simple self-supervised method to train joint-embedding architectures (aka Siamese nets) non contrastively.
arxiv.org/abs/2103.03230
1/N

Basic idea: maximize the normalized correlation between a variable in the left branch and the same var in the right branch, while making the normalized cross-correlation between one var in the left branch and all other vars in the right branch as close to zero as possible.
2/N

In short: the loss tries to make the normalized cross-correlation between the embedding vectors coming out of the left branch and the right branch as close to the identity matrix as possible.
3/N

The 2 branches are always fed with differently-distorted version of the same image, and there is no need for dissimilar training pairs.

The objective makes the embedding vectors of the two branches as similar as possible, while maximizing their information content.
4/N

No contrastive samples, no huge batch size (optimal is 128), nor predictor, no moving-average weights, no vector quantization, nor cut gradients in one of the branches.
5/N

Competitive results on ImageNet with a linear classifier head.
Great results on semi-supervised ImageNet in the low labeled-data regime and on transfer tasks.
6/N

Results on ImageNet with linear classifier head
7/N

Results with 1% and 10% of ImagNet labeled images
8/N

Results on transfer tasks.
9/N

Arch is standard ResNet50 with 2048-D feature vec.
But contrary to others, the embedding size (projector output) is larger. The perf keeps going up as the embedding dim grows (we stopped at 16384).
Probably cause the feature vars are made independent, not just decorrelated.
10/N

Why Barlow? Horace Barlow was a pioneer of visual neuroscience who proposed the idea that the brain tries to minimize redundancy in representations.

By Jure Zbontar, Li Jing, Ishan Misra, yours truly, and Stéphane Deny.
All from FAIR.
To appear at ICML 2021
11/N

Don't you just hate slicing what would be a decent-size post into threaded thin tweets?
12/N

No, really. Don't you hate reading those long thread slices?
If you do, you could just read my Facebook post:
facebook.com/yann.lecun/pos…
13/N
N=13

Typo: optimal batch size is 1024, not 128.
14/13 (haha).

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ylecun

Yann LeCun

@ylecun

6 Jul

https://twitter.com/CSProfKGD/status/1412479324016545795

There were two patents on ConvNets: one for ConvNets with strided convolution, and one for ConvNets with separate pooling layers.
They were filed in 1989 and 1990 and allowed in 1990 and 1991.
1/N

https://twitter.com/CSProfKGD/status/1412479324016545795

We started working with a development group that built OCR systems from it. Shortly thereafter, AT&T acquired NCR, which was building check imagers/sorters for banks. Images were sent to humans for transcription of the amount. Obviously, they wanted to automate that.
2/N

A complete check reading system was eventually built that was reliable enough to be deployed.
Commercial deployment in banks started in 1995.
The system could read about half the checks (machine printed or handwritten) and sent the other half to human operators.
3/N

Read 9 tweets

Yann LeCun

@ylecun

10 Jun

https://twitter.com/annadgoldie/status/1402644252320886786

Very nice work from Google on deep RL- based optimization for chip layout.
Simulated annealing and its heirs are finally dethroned after 40 years.
This uses graph NN and deConvNets, among other things.
I did not imagined back in the 90s that (de)ConvNets could be used for this.

https://twitter.com/annadgoldie/status/1402644252320886786

This is the kind of problems where gradient-free optimization must be applied, because the objectives are not differentiable with respect to the relevant variables. [Continued...]

In this application, RL is used as a particular type of gradient-free optimization to produce a *sequence* of moves.
It uses deep models learn good heuristics as to what action to take in every situation.

This is exactly the type of setting in which RL shines.

Read 4 tweets

Yann LeCun

@ylecun

12 May

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning.
By Adrien Bardes, Jean Ponce, and yours truly.
arxiv.org/abs/2105.04906
Insanely simple and effective method for self-supervised training of joint-embedding architectures (e.g. Siamese nets).
1/N

TL;DR: Joint-embedding archis (JEA) are composed of 2 trainable models Gx(x) and Gy(y), trained with pairs of "compatible" inputs (x,y).
For ex: x and y are distorted versions of the same image, successive sequences of video frames.
The main difficulty is to prevent collapse
2/N

VICReg is a loss for JAE with 3 terms:
1. Variance: Hinge loss to maintain the std-dev of each component of Gx(x) & Gy(y) above a margin
2. Invariance: ||Gx(x)-Gy(y)||^2
3. Covariance: sum of the squares of the off-diag terms of the covariance matrices of Gx(x) and Gy(y).
3/N

Read 12 tweets

Yann LeCun

@ylecun

12 Mar

@mcCronjaeger

@mcCronjaeger @BloombergME The list is much too long for a Twitter thread.
I'll leave that for FB's comm people to do.

@mcCronjaeger

@mcCronjaeger @BloombergME More importantly, the whole premise of the article is wrong.
The SAIL / Responsible AI group's role *never* was to deal with hate speech and misinformation.
That's in the hands of other groups with *hundreds* of people in them.
In fact, "integrity" involves over 30,000 people...

@mcCronjaeger

@mcCronjaeger @BloombergME So the central theme of the article, that RespAI wasn't given the necessary resources to do its job is patently false.

Second, AI is heavily used for content moderation: filtering hate speech, polarizing content, violence, bullying, etc...

Read 10 tweets

Yann LeCun

@ylecun

13 Jan

Electricity production in Europe in 2020.

Right:
Each colored point-cloud is a country
Each point (x,y) is 1 hour of electricity production with x=energy produced in kWh; y=CO2 emission in g/kWh.

Left:
bar graphs of the mix of production methods for select countries.

1/N

France: low overall CO2 emissions, low variance on emissions, relying essentially on nuclear energy with a bit of hydro [reminder: nuclear produce essentially no CO2].
2/N

Germany: despite having a large proportion of renewables, has high emissions and a high variance of emissions: when there is no wind nor sun, it has to rely on fossil fuel, having abandoned and phased out nuclear production.
3/N

Read 6 tweets

Yann LeCun

@ylecun

23 Jun 20

I'm an immigrant.

I came first to work at Bell Labs on a J-1 visa, because I thought I'd stay only a year or two.
But I stayed longer and got an H1-B visa.
Then I got a green card....
1/N

nytimes.com/2020/06/22/us/…

I hesitated to take up citizenship during the GW Bush years, waiting for the country to become respectable again.
But after Bush's re-election, I just wanted to be able to vote and kick out the neocon bastards.
So I became a citizen just in time to vote for Barack Obama.
2/N

As an immigrant, scientist, academic, liberal, atheist, and Frenchman, I am a concentrate of everything the American Right hates.
3/N

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Yann LeCun

Try unrolling a thread yourself!

More from @ylecun

Yann LeCun

Yann LeCun

Yann LeCun

Yann LeCun

Yann LeCun

Yann LeCun

Did Thread Reader help you today?

Like this author's thread?