Tweet

Yann LeCun

Jun 27 • 13 tweets • 4 min read

My position/vision/proposal paper is finally available:
"A Path Towards Autonomous Machine Intelligence"

It is available on OpenReview.net (not arXiv for now) so that people can post reviews, comments, and critiques:
openreview.net/forum?id=BZ5a1…
1/N

The paper distills much of my thinking of the last 5 or 10 years about promising directions in AI.
It is basically what I'm planning to work on, and what I'm hoping to inspire others to work on, over the next decade.
2/N

Most people don't talk publicly about their research plans.
But I'm going beyond the spirit of Open Research by publishing ideas *before* the corresponding research is completed.
3/N

Topics addressed:
- An integrated, DL-based, modular, cognitive architecture.
- Using a world model and intrinsic cost for planning.
- Joint-Embedding Predictive Architecture (JEPA) as an architecture for world models that can handle uncertainty.
4/N

- Training JEPAs using non-contrastive Self-Supervised Learning.
- Hierarchical JEPA for prediction at multiple time scales.
- H-JEPAs can be used for hierarchical planning in which higher levels set objectives for lower levels.

- A configurable world model that can be tailored to the task at hand.
6/N

I express some of my opinions on the best path forward towards AI:
- scaling is necessary but not sufficient
- reward is not enough. Learning world models by observation-based SSL and the use of (differentiable) intrinsic objectives are required for sample-efficient learning.
7/N

- reasoning and planning comes down to inference: finding a sequence of actions and latent variables that minimize a (differentiable) objective. This is an answer to the question of making reasoning compatible with gradient-based learning.
8/N

- In that setting, explicit mechanisms for symbol manipulation are probably unnecessary
9/N

Many of the ideas in this proposal are not new and not mine.
But I've tried to integrate them into coherent architecture.
I probably missed a lot of relevant references and would appreciate any literature pointer.
10/N

I have communicated about the content of this paper over the last few months:
- Blog post: ai.facebook.com/blog/yann-lecu…
- Talk hosted by Baidu:
11/N

- MIT Tech Review article by Melissa Heikkilä: technologyreview.com/2022/06/24/105…
- Fireside chat with Melissa Heikkilä at VivaTech: app.vivatechnology.com/session/b60b78…
12/N

- An FB post with the basic points of the paper: facebook.com/yann.lecun/pos…
13/N, N=13.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ylecun

Yann LeCun

@ylecun

May 17

About the raging debate regarding the significance of recent progress in AI, it may be useful to (re)state a few obvious facts:

(0) there is no such thing as AGI. Reaching "Human Level AI" may be a useful goal, but even humans are specialized.
1/N

(1) the research community is making *some* progress towards HLAI
(2) scaling up helps. It's necessary but not sufficient, because....
(3) we are still missing some fundamental concepts
2/N

(4) some of those new concepts are possibly "around the corner" (e.g. generalized self-supervised learning)
(5) but we don't know how many such new concepts are needed. We just see the most obvious ones.
(6) hence, we can't predict how long it's going to take to reach HLAI.
3/N

Read 9 tweets

Yann LeCun

@ylecun

May 14

Researchers in speech recognition, computer vision, and natural language processing in the 2000s were obsessed with accurate representations of uncertainty.
1/N

This led to a flurry of work on probabilstic generative models such as Hidden Markov Models in speech, Markov random fields and constellation models in vision, and probabilistic topic models in NLP, e.g. with latent Dirichlet analysis.
2/N

There were debates at computer vision workshops about "generative models vs discriminative models". There were heroic-yet-futile attempts to build object recognition systems with non-parametric Bayesian methods.
3/N

Read 12 tweets

Yann LeCun

@ylecun

Feb 4

"VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning" by Adrien Bardes, Jean Ponce, & Yann LeCun.
Accepted at ICLR 2022.
OpenReview/ICLR (camera-ready version + reviews): openreview.net/forum?id=xm6YD…
1/N

A simple method for Self-Supervised Learning of Joint-Embedding Architectures.
Basic idea: get 2 semantically similar inputs & train 2 networks to produce representations of the images that are (1) maximally informative about the input & (2) easy to predict from each other.
2/N

Objective function:
1. Variance: hinge loss maintains variance of each output component above a threshold (over the batch).
2. Invariance: make the two embeddings close to each other.
3. Covariance: decorrelate pairs of components of the each embeddings (over the batch).
3/N

Read 4 tweets

Yann LeCun

@ylecun

Jan 12

ConvNeXt: the debate heats up between ConvNets and Transformers for vision!
Very nice work from FAIR+BAIR colleagues showing that with the right combination of methods, ConvNets are better than Transformers for vision.
87.1% top-1 ImageNet-1k
arxiv.org/abs/2201.03545
1/N

Some of the helpful tricks make complete sense: larger kernels, layer norm, fat layer inside residual blocks, one stage of non-linearity per residual block, separate downsampling layers....
2/N

It's open source, of course: github.com/facebookresear…

Read 4 tweets

Yann LeCun

@ylecun

Oct 19, 2021

"Learning in High Dimension Always Amounts to Extrapolation"
by Randall Balestriero, Jerome Pesenti, and Yann LeCun.
arxiv.org/abs/2110.09485

Thread 1/N

1. given a function on d-dimensional vertors known solely by its value on N training samples, interpolation is defined as estimating the value of the function on a new sample inside the convex hull of the N vectors.
2/N

2. under mild assumptions, a new sample has a very low probability of being inside the convex hull of training samples, *unless* N grows exponentially with d.
3/N

Read 6 tweets

Yann LeCun

@ylecun

Jul 6, 2021

https://twitter.com/CSProfKGD/status/1412479324016545795

There were two patents on ConvNets: one for ConvNets with strided convolution, and one for ConvNets with separate pooling layers.
They were filed in 1989 and 1990 and allowed in 1990 and 1991.
1/N

https://twitter.com/CSProfKGD/status/1412479324016545795

We started working with a development group that built OCR systems from it. Shortly thereafter, AT&T acquired NCR, which was building check imagers/sorters for banks. Images were sent to humans for transcription of the amount. Obviously, they wanted to automate that.
2/N

A complete check reading system was eventually built that was reliable enough to be deployed.
Commercial deployment in banks started in 1995.
The system could read about half the checks (machine printed or handwritten) and sent the other half to human operators.
3/N

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Yann LeCun

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @ylecun

Yann LeCun

Yann LeCun

Yann LeCun

Yann LeCun

Yann LeCun

Yann LeCun

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?