Latest Twitter Threads by @fchollet on Thread Reader App

Aug 21 • 4 tweets • 2 min read

People ask me, "didn't you say before ChatGPT that deep learning had hit a wall and there would be no more progress?"

I have never said this. I was saying the opposite (that scaling DL would deliver). You might be thinking of Gary Marcus.

My pre-ChatGPT position (below) was that scaling up DL would keep delivering better and better results, and *also* that it wasn't the way to AGI (as I defined it: human-level skill acquisition efficiency).

This was a deeply unpopular position at the time (neither AI skeptic nor AGI-via-DL-scaling prophet). It is now completely mainstream. People also ask, "didn't you say in 2023 that LLMs could not reason?"

I have also never said this. I am on the record across many channels (Twitter, podcasts...) saying that "can LLMs reason?" was not a relevant question, just semantics, and that the more interesting question was, "could they adapt to novel tasks beyond what they had been trained on?" -- and that the answer was no.

Also correct in retrospect, and a mainstream position today.

Mar 24 • 5 tweets • 3 min read

Today, we're releasing ARC-AGI-2. It's an AI benchmark designed to measure general fluid intelligence, not memorized skills – a set of never-seen-before tasks that humans find easy, but current AI struggles with.

It keeps the same format as ARC-AGI-1, while significantly increasing the signal strength it provides about a system's actual fluid intelligence. Expect more novelty, less redundancy, and deeper levels of concept recombination. There's a lot more focus on probing abilities that are still missing from frontier reasoning systems, like on-the-fly symbol interpretation, multi-step compositional reasoning, and context-dependent rules.

ARC-AGI-2 is fully human-calibrated. We tested these tasks with 400 people in live sessions, and we only kept tasks that could reliably be solved by multiple people. Each eval set (public, private, semi-private) has the exact same human difficulty – average people in our test sample achieve 60% with no prior training, and a panel of 10 people achieve 100%.

ARC-AGI-2 dataset: github.com/arcprize/ARC-A…

Full details on the release: arcprize.org/blog/announcin…

Jan 15 • 6 tweets • 2 min read

I'm joining forces with @mikeknoop to start Ndea (@ndeainc), a new AI lab.

Our focus: deep learning-guided program synthesis. We're betting on a different path to build AI capable of true invention, adaptation, and innovation.

Read about our goals here: ndea.com

Jan 15 • 4 tweets • 1 min read

People scaled LLMs by ~10,000x from 2019 to 2024, and their scores on ARC stayed near 0 (e.g. GPT-4o at ~5%). Meanwhile a very crude program search approach could score >20% with hardly any compute.

Then OpenAI started adding test-time CoT search. ARC scores immediately shot up. It's not about scale. It's about working on the right ideas.

Like deep-learning guided CoT synthesis or program synthesis. Via search.

Dec 20, 2024 • 8 tweets • 4 min read

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

My full statement here: arcprize.org/blog/oai-o3-pu…

Nov 9, 2024 • 4 tweets • 1 min read

When we develop AI systems that can actually reason, they will involve deep learning (as one of two major components, the other one being discrete search), and some people will say that this "proves" that DL can reason.

No, it will have proven the thesis that DL is not enough, and that we need to combine DL with discrete search. From my DL textbook (1st edition), published in 2017. Seven years later, there is now overwhelming momentum towards this exact approach.

Oct 26, 2024 • 10 tweets • 2 min read

In the last Trump administration, legal, high-skilled immigration was cut by ~30% before Covid, then by 100% after Covid (which was definitely a choice: a number of countries kept issuing residency permits and visas). However illegal immigrant inflows did not go down (they've been stable since the mid-2000s). If you're a scientist or engineer applying for a green card, you're probably keenly aware that your chances of eventually obtaining it are highly dependent on the election. What you may not know is that, if you're a naturalized citizen, your US passport is also at stake

Oct 20, 2024 • 6 tweets • 2 min read

When we say deep learning models operate via memorization, the claim isn't that they work like literal lookup tables, only being able to make sense of points that are exactly part of their training data. No one has claimed that -- it wouldn't even be true of linear regression. Of course deep learning models can generalize to unseen data points -- they would be entirely useless if they couldn't. The claim is that they perform *local generalization*: generalization to known unknowns, to degrees of variability for which you can provide a dense sampling at training time.

Jun 22, 2024 • 4 tweets • 2 min read

Fact check: my 3-year old builds Lego sets (age 5+ ones) on his own by following the instruction booklet. He started doing it before he turned 3 -- initially he needed externally provided error correction and guidance, but now he's just fully autonomous. Can't handle sets for ages 8+ yet though. We'll see what he does at 5. He also builds his own ideas, which feature minor original inventions. Like this "jeep" which has a spare tire on the back -- not something he saw in any official set. Lego is the best toy ever by the way

Jun 11, 2024 • 8 tweets • 3 min read

I'm partnering with @mikeknoop to launch ARC Prize: a $1,000,000 competition to create an AI that can adapt to novelty and solve simple reasoning problems.

Let's get back on track towards AGI.

Website:

ARC Prize on @kaggle: arcprize.org
kaggle.com/competitions/a…

I published the ARC benchmark over 4 years ago. It was intended to be a measure of how close we are to creating AI that can reason on its own – not just apply memorized patterns.

May 14, 2024 • 5 tweets • 2 min read

It's amazing to me that the year is 2024 and some people still equate task-specific skill and intelligence. There is *no* specific task that cannot be solved *without* intelligence -- all you need a sufficiently complete description of the task (removing all test-time novelty and uncertainty), and you can achieve arbitrary levels of skills while entirely by-passing the problem of intelligence. In the limit, even a simple hashtable can be superhuman at anything.

The "AI" of today still has near-zero (though not exactly zero) intelligence, despite achieving superhuman skill at many tasks.

Here's one thing that AI won't be able to do within five years (if you extrapolate from the excruciatingly slow progress of the past 15 years): acquiring new skills as efficiently as humans, using the same data. The ARC benchmark is an attempt at measuring roughly that.

Apr 28, 2024 • 4 tweets • 2 min read

Many of the people who are concerned with falling birthrates aren't willing to consider the set policies that would address the problem -- aggressive tax breaks for families, free daycare, free education, free healthcare, and building more/denser housing to slash the price of homes.

Most people want children, but can't afford them. I always found it striking how very rich couples (50M+ net worth) all tend to have over 3 children (and often many more). And how young women always say they want children -- yet in practice they delay family building because they are forced to focus on financial stability and therefore career. When money is not an object, families have 3+ children.

Mar 31, 2024 • 4 tweets • 2 min read

That memorization (which ML has solely focused on) is not intelligence. And because any task that does not involve significant novelty and uncertainty can be solved via memorization, *skill* is never a sign of intelligence, no matter the task.

https://twitter.com/MIT_CSAIL/status/1774467004201578566

Intelligence is found in the ability to pick up new skills quickly & efficiently -- at tasks you weren't prepared for. To improvise, adapt and learn.

Mar 13, 2024 • 6 tweets • 3 min read

We benchmarked a range of popular models (SegmentAnything, BERT, StableDiffusion, Gemma, Mistral) with all Keras 3 backends (JAX/TF/PT). Key findings:

1. There's no "best" backend. The fastest backend often depends on your specific model architecture.

2. Keras 3 with the right backend is consistently a lot faster than reference PT (compiled) implementations. Often by 150%+.

3. Keras 3 models are fast without requiring any custom performance optimizations. It's all "stock" code.

4. Keras 3 is faster than Keras 2.

Details here: keras.io/getting_starte… Finding 1: the fastest backend for a given model typically alternates between XLA-compiled JAX and XLA-compiled TF. Plus, you might want to debug/prototype in PT before training/inferencing with JAX or TF.

The ability to write framework-agnostic models and pick your backend later is a game-changer.

Mar 12, 2024 • 4 tweets • 2 min read

It doesn't take a whole lot of pondering to figure out that the thesis "humans only seem smart because they're 'trained' on huge amounts of 'data' via their visual system (almost like LLMs!)" doesn't hold any water.

For instance -- congenitally blind people are not less intelligent. Vision isn't fundamental to what makes us human. A rich learning environment is still a rich learning environment when apprehended through restricted sensorimotor modalities. Humans span an incredibly wide range of sensorimotor affordances. Some are blind, some are deaf, some don't have hands. They might grow up in radically different environments -- some with just three other humans around them, some with thousands. Some with libraries of books, some without any writing.

In the end, though, it doesn't make a huge difference -- all of them become fully-fledged, intelligent humans. Because no matter what, they're all extracting information from the world at a roughly constant rate: the intrinsic rate at which the brain processes information. Which is an infinitesimal fraction of the bandwidth of the human sensorimotor feed.

If your senses are missing something, you'll just report your fixed-rate attention to something else, and won't be much poorer for it.

Feb 21, 2024 • 9 tweets • 4 min read

Thread: quick API overview of Gemma, the new open-source LLM by Google.

First, let's make sure you have the latest Keras and KerasNLP installed, and let's set up your Kaggle credentials, so you can download the assets from Kaggle.

Next, let's instantiate the model and generate some text. You have access to 2 different sizes, 2B & 7B, and 2 different versions per size: base & instruction-tuned.

The first call will download the weights.

Feb 17, 2024 • 6 tweets • 3 min read

The "aha" moment when I realized that curve-fitting was the wrong paradigm for achieving generalizable modeling of problems spaces that involve symbolic reasoning was in early 2016.

I was trying every possible way to get a LSTM/GRU based model to classify first-order logic statements, and each new attempt was showing a bit more clearly than the last that my models were completely unable to learn to perform actual first-order logic -- despite the fact that this ability was definitely part of the representable function space. Instead, the models would inevitably latch onto statistical keyword associations to make their predictions.

It has been fascinating to see this observation echo again and again over the past 8 years. From 2013 to 2016 I was actually quite convinced that RNNs could be trained to learn any program. After all, they're Turing-complete (or at least some of them are) and they learn a highly compressed model of the input:output mapping they're trained on (rather than mere pointwise associations). Surely they could perform symbolic program synthesis in some continuous latent program space?

Nope. They do in fact learn mere pointwise associations and completely useless for program synthesis. The problem isn't with what the function space can represent -- the problem is the learning process. It's SGD.

Feb 17, 2024 • 11 tweets • 4 min read

Video generation models and Neural Radiance Fields have been improving regularly since 2016, and now they're in the spotlight. As a result there's a been a lot of debate about whether such systems embed a *model of physics*. Let's take a look... These systems are capable of making next-frame visual predictions about a given physical situation might evolve. So they do have a model of physics.

The real questions are, is this model accurate? Is it capable of generalizing to novel situations, that aren't interpolations of what the model has been trained on?

Jan 20, 2024 • 6 tweets • 2 min read

When I say I want to build "strong AI" or "general AI", I don't mean "AGI" in the sense that most everyone else means it.

In its common use, "AGI" is a cultural construct akin to the Philosopher's Stone, that no one can define crisply but on which people project all kinds of magical powers -- it will enable you to live forever, it will resurrect the dead in digital form, it will provide unlimited abundance, etc. Meanwhile what I mean is AI with general cognitive abilities, capable of picking up new skills with similar efficiency (or higher!) as humans, over a similar scope of problems (or greater!). It would be a tremendously useful tool in pretty much every domain, in particular science.

Dec 17, 2023 • 12 tweets • 2 min read

To understand X means you have the ability to act appropriately in response to situations related to X -- for instance, you understand how to make coffee in a kitchen if you can walk into a random kitchen and make coffee. Because your ability to act appropriately is dependent on how *novel* of a situation you find yourself in, understanding is a *spectrum*, rather than a binary attribute. You can understand something with more or less generality. This is captured in the notion of *generalization*.

Dec 15, 2023 • 5 tweets • 3 min read

Unfortunately , too few people understand the distinction between memorization and understanding. It's not some lofty question like "does the system have an internal world model?", it's a very pragmatic behavior distinction: "is the system capable of broad generalization, or is it limited to local generalization?" LLMs have failed every single benchmark and experiment focused on generalization, since their inception. It's not just ARC -- this is documented in literally hundreds, possibly thousands of papers. The ability of LLMs to solve a task is entirely dependent of their familiarity with the task (local generalization).

Share this page!

Enter URL or ID to Unroll