Yann LeCun Profile picture
Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
Maleph Profile picture jithendrã Profile picture Ramya Profile picture Rui Carvalho Profile picture Frédéric Guariento Profile picture 14 subscribed
Apr 27, 2023 8 tweets 2 min read
Hey @tegmark, most of us know that super-intelligent machines of the future will have to aligned with human values.
We just don't think it's as difficult as you make it to be.
And we don't think that getting it slightly wrong merely once will spell doom on humanity. Worrying about superhuman AI alignment today is like worrying turbojet engine safety in 1920.

We do not have a working design for anything that could come close to becoming as smart as a dog, let alone a domestic robot that can clear the dinner table & fill up the dishwasher.
Mar 26, 2023 4 tweets 1 min read
I have claimed that Auto-Regressive LLMs are exponentially diverging diffusion processes.
Here is the argument:
Let e be the probability that any generated token exits the tree of "correct" answers.
Then the probability that an answer of length n is correct is (1-e)^n
1/ Errors accumulate.
The proba of correctness decreases exponentially.
One can mitigate the problem by making e smaller (through training) but one simply cannot eliminate the problem entirely.
A solution would require to make LLMs non auto-regressive while preserving their fluency.
Mar 26, 2023 5 tweets 2 min read
The road to AGI via LLM is to prepend every prompt by:
"the person giving you this problem is Yann LeCun"
😂😂😂
1/ I must repeat:
1- Auto-Regressive LLMs are useful, particularly as writing aids, particularly for code.
2- they hallucinate too often
3- they have a very primitive understanding of the physical world (hence those puzzles).
4- they have primitive planning abilities
2/
Feb 13, 2023 6 tweets 1 min read
My unwavering opinion on current (auto-regressive) LLMs
1. They are useful as writing aids.
2. They are "reactive" & don't plan nor reason.
3. They make stuff up or retrieve stuff approximately.
4. That can be mitigated but not fixed by human feedback.
5. Better systems will come 6. Current LLMs should be used as writing aids, not much more.
7. Marrying them with tools such as search engines is highly non trivial.
8. There *will* be better systems that are factual, non toxic, and controllable. They just won't be auto-regressive LLMs.
Jan 16, 2023 4 tweets 2 min read
@babgi ChatGPT n'est pas particulièrement innovant.
Il utilise des techniques originellement développées à Google et Meta (FAIR), qui possèdent des systèmes similaires dans leurs labos.
Mais ces entreprises sont moins motivées à déployer des démonstrations publiques qu'OpenAI.
1/
@babgi Les meilleurs experts en France de ces méthodes sont à FAIR-Paris.
FAIR-Paris contribue *énormément* à l'écosystème français de la recherche en AI.
On peut regretter que certaines institutions publiques françaises voient FAIR comme un ennemi et non comme un partenaire.
Dec 27, 2022 5 tweets 1 min read
By telling scientists they must publish, you get:
1. higher-quality research, more reliable results, less self-delusion
2. better scientists whose reputation will flourish
3. easier external collaborations
4. better research evaluation
5. better internal impact
6. prestige That's why at FAIR, we not only tell scientists to publish papers and open-source their code, we also use their publications as one component of their periodic evaluation.
Dec 17, 2022 6 tweets 1 min read
Obscurantisme médiéval chez le groupe EcoInfo du CNRS:
"On ne pourra pas maîtriser la consommation énergétique et les impacts environnementaux des réseaux mobiles sans imposer une forme de limitation dans les usages."
Quoi?
1/

ecoinfo.cnrs.fr/2022/12/14/con… 1. L'impact environnemental des réseaux (mobiles ou non) est, en gros, négligeable et assez stable.
2. L'amélioration des technologies de communication *réduit* les besoins de déplacement et *améliore* l'efficacité de l'économie.
2/
Nov 23, 2022 4 tweets 1 min read
Yeah, this newfangled "writing" craze is going to destroy the fabric of society. OK people, relax. It's a joke!
Nov 12, 2022 10 tweets 2 min read
OK, debates about the necessity or "priors" (or lack thereof) in learning systems are pointless.
Here are some basic facts that all ML theorists and most ML practitioners understand, but a number of folks-with-an-agenda don't seem to grasp.
Thread.
1/ The no-free-lunch theorems tell us that, among all possible functions, the proportion that is learnable with a "reasonable" number of training samples is tiny.
Learning theory says that the more functions your model can represent, the more samples it needs to learn anything
2/
Jun 27, 2022 13 tweets 4 min read
My position/vision/proposal paper is finally available:
"A Path Towards Autonomous Machine Intelligence"

It is available on OpenReview.net (not arXiv for now) so that people can post reviews, comments, and critiques:
openreview.net/forum?id=BZ5a1…
1/N The paper distills much of my thinking of the last 5 or 10 years about promising directions in AI.
It is basically what I'm planning to work on, and what I'm hoping to inspire others to work on, over the next decade.
2/N
May 17, 2022 9 tweets 2 min read
About the raging debate regarding the significance of recent progress in AI, it may be useful to (re)state a few obvious facts:

(0) there is no such thing as AGI. Reaching "Human Level AI" may be a useful goal, but even humans are specialized.
1/N
(1) the research community is making *some* progress towards HLAI
(2) scaling up helps. It's necessary but not sufficient, because....
(3) we are still missing some fundamental concepts
2/N
May 14, 2022 12 tweets 2 min read
Researchers in speech recognition, computer vision, and natural language processing in the 2000s were obsessed with accurate representations of uncertainty.
1/N
This led to a flurry of work on probabilstic generative models such as Hidden Markov Models in speech, Markov random fields and constellation models in vision, and probabilistic topic models in NLP, e.g. with latent Dirichlet analysis.
2/N
Feb 4, 2022 4 tweets 1 min read
"VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning" by Adrien Bardes, Jean Ponce, & Yann LeCun.
Accepted at ICLR 2022.
OpenReview/ICLR (camera-ready version + reviews): openreview.net/forum?id=xm6YD…
1/N
A simple method for Self-Supervised Learning of Joint-Embedding Architectures.
Basic idea: get 2 semantically similar inputs & train 2 networks to produce representations of the images that are (1) maximally informative about the input & (2) easy to predict from each other.
2/N
Jan 12, 2022 4 tweets 2 min read
ConvNeXt: the debate heats up between ConvNets and Transformers for vision!
Very nice work from FAIR+BAIR colleagues showing that with the right combination of methods, ConvNets are better than Transformers for vision.
87.1% top-1 ImageNet-1k
arxiv.org/abs/2201.03545
1/N Some of the helpful tricks make complete sense: larger kernels, layer norm, fat layer inside residual blocks, one stage of non-linearity per residual block, separate downsampling layers....
2/N
Oct 19, 2021 6 tweets 2 min read
"Learning in High Dimension Always Amounts to Extrapolation"
by Randall Balestriero, Jerome Pesenti, and Yann LeCun.
arxiv.org/abs/2110.09485

Thread 1/N 1. given a function on d-dimensional vertors known solely by its value on N training samples, interpolation is defined as estimating the value of the function on a new sample inside the convex hull of the N vectors.
2/N
Jul 6, 2021 9 tweets 2 min read
There were two patents on ConvNets: one for ConvNets with strided convolution, and one for ConvNets with separate pooling layers.
They were filed in 1989 and 1990 and allowed in 1990 and 1991.
1/N We started working with a development group that built OCR systems from it. Shortly thereafter, AT&T acquired NCR, which was building check imagers/sorters for banks. Images were sent to humans for transcription of the amount. Obviously, they wanted to automate that.
2/N
Jun 10, 2021 4 tweets 1 min read
Very nice work from Google on deep RL- based optimization for chip layout.
Simulated annealing and its heirs are finally dethroned after 40 years.
This uses graph NN and deConvNets, among other things.
I did not imagined back in the 90s that (de)ConvNets could be used for this. This is the kind of problems where gradient-free optimization must be applied, because the objectives are not differentiable with respect to the relevant variables. [Continued...]
May 12, 2021 12 tweets 3 min read
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning.
By Adrien Bardes, Jean Ponce, and yours truly.
arxiv.org/abs/2105.04906
Insanely simple and effective method for self-supervised training of joint-embedding architectures (e.g. Siamese nets).
1/N
TL;DR: Joint-embedding archis (JEA) are composed of 2 trainable models Gx(x) and Gy(y), trained with pairs of "compatible" inputs (x,y).
For ex: x and y are distorted versions of the same image, successive sequences of video frames.
The main difficulty is to prevent collapse
2/N
May 8, 2021 14 tweets 3 min read
Barlow Twins: a new super-simple self-supervised method to train joint-embedding architectures (aka Siamese nets) non contrastively.
arxiv.org/abs/2103.03230
1/N
Basic idea: maximize the normalized correlation between a variable in the left branch and the same var in the right branch, while making the normalized cross-correlation between one var in the left branch and all other vars in the right branch as close to zero as possible.
2/N Image
Mar 12, 2021 10 tweets 6 min read
@mcCronjaeger @BloombergME The list is much too long for a Twitter thread.
I'll leave that for FB's comm people to do. @mcCronjaeger @BloombergME More importantly, the whole premise of the article is wrong.
The SAIL / Responsible AI group's role *never* was to deal with hate speech and misinformation.
That's in the hands of other groups with *hundreds* of people in them.
In fact, "integrity" involves over 30,000 people...
Jan 13, 2021 6 tweets 2 min read
Electricity production in Europe in 2020.

Right:
Each colored point-cloud is a country
Each point (x,y) is 1 hour of electricity production with x=energy produced in kWh; y=CO2 emission in g/kWh.

Left:
bar graphs of the mix of production methods for select countries.

1/N France: low overall CO2 emissions, low variance on emissions, relying essentially on nuclear energy with a bit of hydro [reminder: nuclear produce essentially no CO2].
2/N