Yann LeCun Profile picture
Jun 27 13 tweets 4 min read
My position/vision/proposal paper is finally available:
"A Path Towards Autonomous Machine Intelligence"

It is available on OpenReview.net (not arXiv for now) so that people can post reviews, comments, and critiques:
openreview.net/forum?id=BZ5a1…
1/N
The paper distills much of my thinking of the last 5 or 10 years about promising directions in AI.
It is basically what I'm planning to work on, and what I'm hoping to inspire others to work on, over the next decade.
2/N
Most people don't talk publicly about their research plans.
But I'm going beyond the spirit of Open Research by publishing ideas *before* the corresponding research is completed.
3/N
Topics addressed:
- An integrated, DL-based, modular, cognitive architecture.
- Using a world model and intrinsic cost for planning.
- Joint-Embedding Predictive Architecture (JEPA) as an architecture for world models that can handle uncertainty.
4/N
- Training JEPAs using non-contrastive Self-Supervised Learning.
- Hierarchical JEPA for prediction at multiple time scales.
- H-JEPAs can be used for hierarchical planning in which higher levels set objectives for lower levels.
- A configurable world model that can be tailored to the task at hand.
6/N
I express some of my opinions on the best path forward towards AI:
- scaling is necessary but not sufficient
- reward is not enough. Learning world models by observation-based SSL and the use of (differentiable) intrinsic objectives are required for sample-efficient learning.
7/N
- reasoning and planning comes down to inference: finding a sequence of actions and latent variables that minimize a (differentiable) objective. This is an answer to the question of making reasoning compatible with gradient-based learning.
8/N
- In that setting, explicit mechanisms for symbol manipulation are probably unnecessary
9/N
Many of the ideas in this proposal are not new and not mine.
But I've tried to integrate them into coherent architecture.
I probably missed a lot of relevant references and would appreciate any literature pointer.
10/N
I have communicated about the content of this paper over the last few months:
- Blog post: ai.facebook.com/blog/yann-lecu…
- Talk hosted by Baidu:
11/N
- MIT Tech Review article by Melissa Heikkilä: technologyreview.com/2022/06/24/105…
- Fireside chat with Melissa Heikkilä at VivaTech: app.vivatechnology.com/session/b60b78…
12/N
- An FB post with the basic points of the paper: facebook.com/yann.lecun/pos…
13/N, N=13.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yann LeCun

Yann LeCun Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ylecun

May 17
About the raging debate regarding the significance of recent progress in AI, it may be useful to (re)state a few obvious facts:

(0) there is no such thing as AGI. Reaching "Human Level AI" may be a useful goal, but even humans are specialized.
1/N
(1) the research community is making *some* progress towards HLAI
(2) scaling up helps. It's necessary but not sufficient, because....
(3) we are still missing some fundamental concepts
2/N
(4) some of those new concepts are possibly "around the corner" (e.g. generalized self-supervised learning)
(5) but we don't know how many such new concepts are needed. We just see the most obvious ones.
(6) hence, we can't predict how long it's going to take to reach HLAI.
3/N
Read 9 tweets
May 14
Researchers in speech recognition, computer vision, and natural language processing in the 2000s were obsessed with accurate representations of uncertainty.
1/N
This led to a flurry of work on probabilstic generative models such as Hidden Markov Models in speech, Markov random fields and constellation models in vision, and probabilistic topic models in NLP, e.g. with latent Dirichlet analysis.
2/N
There were debates at computer vision workshops about "generative models vs discriminative models". There were heroic-yet-futile attempts to build object recognition systems with non-parametric Bayesian methods.
3/N
Read 12 tweets
Feb 4
"VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning" by Adrien Bardes, Jean Ponce, & Yann LeCun.
Accepted at ICLR 2022.
OpenReview/ICLR (camera-ready version + reviews): openreview.net/forum?id=xm6YD…
1/N
A simple method for Self-Supervised Learning of Joint-Embedding Architectures.
Basic idea: get 2 semantically similar inputs & train 2 networks to produce representations of the images that are (1) maximally informative about the input & (2) easy to predict from each other.
2/N
Objective function:
1. Variance: hinge loss maintains variance of each output component above a threshold (over the batch).
2. Invariance: make the two embeddings close to each other.
3. Covariance: decorrelate pairs of components of the each embeddings (over the batch).
3/N
Read 4 tweets
Jan 12
ConvNeXt: the debate heats up between ConvNets and Transformers for vision!
Very nice work from FAIR+BAIR colleagues showing that with the right combination of methods, ConvNets are better than Transformers for vision.
87.1% top-1 ImageNet-1k
arxiv.org/abs/2201.03545
1/N
Some of the helpful tricks make complete sense: larger kernels, layer norm, fat layer inside residual blocks, one stage of non-linearity per residual block, separate downsampling layers....
2/N
It's open source, of course: github.com/facebookresear…
Read 4 tweets
Oct 19, 2021
"Learning in High Dimension Always Amounts to Extrapolation"
by Randall Balestriero, Jerome Pesenti, and Yann LeCun.
arxiv.org/abs/2110.09485

Thread 1/N
1. given a function on d-dimensional vertors known solely by its value on N training samples, interpolation is defined as estimating the value of the function on a new sample inside the convex hull of the N vectors.
2/N
2. under mild assumptions, a new sample has a very low probability of being inside the convex hull of training samples, *unless* N grows exponentially with d.
3/N
Read 6 tweets
Jul 6, 2021
There were two patents on ConvNets: one for ConvNets with strided convolution, and one for ConvNets with separate pooling layers.
They were filed in 1989 and 1990 and allowed in 1990 and 1991.
1/N
We started working with a development group that built OCR systems from it. Shortly thereafter, AT&T acquired NCR, which was building check imagers/sorters for banks. Images were sent to humans for transcription of the amount. Obviously, they wanted to automate that.
2/N
A complete check reading system was eventually built that was reliable enough to be deployed.
Commercial deployment in banks started in 1995.
The system could read about half the checks (machine printed or handwritten) and sent the other half to human operators.
3/N
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(