François Chollet Profile picture
Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.
🇺🇸 Mike England 🇺🇸 Profile picture IrritatedWoman Profile picture Stand for something Profile picture hvns2mergatroid Profile picture R. Chitwood 🇺🇸🇺🇸🇺🇸 Profile picture 93 subscribed
Mar 31 4 tweets 2 min read
That memorization (which ML has solely focused on) is not intelligence. And because any task that does not involve significant novelty and uncertainty can be solved via memorization, *skill* is never a sign of intelligence, no matter the task. Intelligence is found in the ability to pick up new skills quickly & efficiently -- at tasks you weren't prepared for. To improvise, adapt and learn.
Mar 13 6 tweets 3 min read
We benchmarked a range of popular models (SegmentAnything, BERT, StableDiffusion, Gemma, Mistral) with all Keras 3 backends (JAX/TF/PT). Key findings:

1. There's no "best" backend. The fastest backend often depends on your specific model architecture.

2. Keras 3 with the right backend is consistently a lot faster than reference PT (compiled) implementations. Often by 150%+.

3. Keras 3 models are fast without requiring any custom performance optimizations. It's all "stock" code.

4. Keras 3 is faster than Keras 2.

Details here: keras.io/getting_starte… Finding 1: the fastest backend for a given model typically alternates between XLA-compiled JAX and XLA-compiled TF. Plus, you might want to debug/prototype in PT before training/inferencing with JAX or TF.

The ability to write framework-agnostic models and pick your backend later is a game-changer.Image
Mar 12 4 tweets 2 min read
It doesn't take a whole lot of pondering to figure out that the thesis "humans only seem smart because they're 'trained' on huge amounts of 'data' via their visual system (almost like LLMs!)" doesn't hold any water.

For instance -- congenitally blind people are not less intelligent. Vision isn't fundamental to what makes us human. A rich learning environment is still a rich learning environment when apprehended through restricted sensorimotor modalities. Humans span an incredibly wide range of sensorimotor affordances. Some are blind, some are deaf, some don't have hands. They might grow up in radically different environments -- some with just three other humans around them, some with thousands. Some with libraries of books, some without any writing.

In the end, though, it doesn't make a huge difference -- all of them become fully-fledged, intelligent humans. Because no matter what, they're all extracting information from the world at a roughly constant rate: the intrinsic rate at which the brain processes information. Which is an infinitesimal fraction of the bandwidth of the human sensorimotor feed.

If your senses are missing something, you'll just report your fixed-rate attention to something else, and won't be much poorer for it.
Feb 21 9 tweets 4 min read
Thread: quick API overview of Gemma, the new open-source LLM by Google.

First, let's make sure you have the latest Keras and KerasNLP installed, and let's set up your Kaggle credentials, so you can download the assets from Kaggle. Image Next, let's instantiate the model and generate some text. You have access to 2 different sizes, 2B & 7B, and 2 different versions per size: base & instruction-tuned.

The first call will download the weights. Image
Feb 17 6 tweets 3 min read
The "aha" moment when I realized that curve-fitting was the wrong paradigm for achieving generalizable modeling of problems spaces that involve symbolic reasoning was in early 2016.

I was trying every possible way to get a LSTM/GRU based model to classify first-order logic statements, and each new attempt was showing a bit more clearly than the last that my models were completely unable to learn to perform actual first-order logic -- despite the fact that this ability was definitely part of the representable function space. Instead, the models would inevitably latch onto statistical keyword associations to make their predictions.

It has been fascinating to see this observation echo again and again over the past 8 years. From 2013 to 2016 I was actually quite convinced that RNNs could be trained to learn any program. After all, they're Turing-complete (or at least some of them are) and they learn a highly compressed model of the input:output mapping they're trained on (rather than mere pointwise associations). Surely they could perform symbolic program synthesis in some continuous latent program space?

Nope. They do in fact learn mere pointwise associations and completely useless for program synthesis. The problem isn't with what the function space can represent -- the problem is the learning process. It's SGD.
Feb 17 11 tweets 4 min read
Video generation models and Neural Radiance Fields have been improving regularly since 2016, and now they're in the spotlight. As a result there's a been a lot of debate about whether such systems embed a *model of physics*. Let's take a look... These systems are capable of making next-frame visual predictions about a given physical situation might evolve. So they do have a model of physics.

The real questions are, is this model accurate? Is it capable of generalizing to novel situations, that aren't interpolations of what the model has been trained on?
Jan 20 6 tweets 2 min read
When I say I want to build "strong AI" or "general AI", I don't mean "AGI" in the sense that most everyone else means it.

In its common use, "AGI" is a cultural construct akin to the Philosopher's Stone, that no one can define crisply but on which people project all kinds of magical powers -- it will enable you to live forever, it will resurrect the dead in digital form, it will provide unlimited abundance, etc. Meanwhile what I mean is AI with general cognitive abilities, capable of picking up new skills with similar efficiency (or higher!) as humans, over a similar scope of problems (or greater!). It would be a tremendously useful tool in pretty much every domain, in particular science.
Dec 17, 2023 12 tweets 2 min read
To understand X means you have the ability to act appropriately in response to situations related to X -- for instance, you understand how to make coffee in a kitchen if you can walk into a random kitchen and make coffee. Because your ability to act appropriately is dependent on how *novel* of a situation you find yourself in, understanding is a *spectrum*, rather than a binary attribute. You can understand something with more or less generality. This is captured in the notion of *generalization*.
Dec 15, 2023 5 tweets 3 min read
Unfortunately , too few people understand the distinction between memorization and understanding. It's not some lofty question like "does the system have an internal world model?", it's a very pragmatic behavior distinction: "is the system capable of broad generalization, or is it limited to local generalization?" LLMs have failed every single benchmark and experiment focused on generalization, since their inception. It's not just ARC -- this is documented in literally hundreds, possibly thousands of papers. The ability of LLMs to solve a task is entirely dependent of their familiarity with the task (local generalization).
Nov 10, 2023 4 tweets 1 min read
Biggest misconceptions in AI:

1. Confusing skill and intelligence. They are orthogonal. General intelligence can be converted into skill at many tasks, but in reverse, you can achieve arbitrary levels of skill at arbitrary tasks without requiring any intelligence at all. 2. Believing that creating artificial intelligence would equate creating artificial human-like entities. Humans are a complex assembly of multiple systems, of which intelligence is one. Replicating intelligence specifically would not entail replicating emotions, awareness, social skills, human needs and motivations, etc. A true AI would be extremely non human-like by default.
Oct 31, 2023 6 tweets 2 min read
I think there are broadly three categories of problem solving patterns -- recitation, intuition, and reasoning.

Recitation: you simply recognize a known problem and apply the steps you've learned. Like playing a chess opening.

Intuition: in the face of a novel situation, you (mostly subconsciously) pattern-match it to what you've encountered before and you "just know" what to do (sometimes without really understanding why). Could be done completely in autopilot -- no awareness required. Like a very experienced chess player seeing the best move in <1s.

Reasoning: you consciously and deliberately analyze a novel situation, using a combination of abstract principles and step-by-step simulation. Like analyzing a chess position and simulating in your mind possible future trajectories. Recitation is a database lookup.

Intuition is interpolative generalization or proximity-based generalization in a continuous space.

Reasoning is discrete search and discrete planning.
Aug 21, 2023 4 tweets 1 min read
When reporting performance on ARC, make sure to separately report train set and validation set performance... Keep in mind that the train set contains many "curriculum" tasks, whose purpose is to learn Core Knowledge priors, and which are much simpler than other tasks. If you report ~85 total solved tasks out of the 400 train set + 400 validation set tasks, I'm going to guess that ~60-70 of the solved tasks were in the train set. Your actual performance would then be ~20 out of 400, which is 5%. SotA is ~30%.
Aug 19, 2023 5 tweets 1 min read
Neuroscience is slowly reaching a stage of advancement where it is going to become very useful for fundamental advances in AI. And this is partly thanks to AI, e.g. image segmentation. None of the key ML techniques developed in the past 20 years were inspired by neuroscience or connected to neuroscience at all (not that people didn't try bio-inspired mechanisms -- but those simply didn't stick in practice). But this might not be true for the next 20 years.
Aug 10, 2023 14 tweets 4 min read
If you train a ML system on one task, and then it becomes able to perform another task you did not anticipate, that's emergence.

Many people interpret "emergence" as something wondrous and magical -- "it's alive!" But it's actually banal and has been going on for a long time. Emergent learning happens because information space is not random. It's highly organized. So that if you learn one aspect of its organization, you will pick up other aspects as a by-product.

It's especially prevalent with self-supervised learning.
Jul 23, 2023 6 tweets 2 min read
Remember -- we are making progress on AI (though far more on applications than on generality, which remains largely a green field). The progress is significant in speed and magnitude. But the conventional wisdom of the tech community about current and near-future AI capabilities… twitter.com/i/web/status/1… In 2016, when I tweeted that human-level language understanding was many years away (which is still the case now, though we're closer), mind the context: this was in response to many people, including prominent VCs, claiming that then-current AI was nearly there and was about to… twitter.com/i/web/status/1…
Jun 19, 2023 8 tweets 2 min read
Here's an interesting anecdote on that topic.

In 1979, a group of people who were "just asking questions" offered a $50,000 reward ($220,000 in 2023 dollars) to anyone who could prove that the holocaust actually happened. You know, in order to spark discussion. The group in question was the "Institute for Historical Review", the first group of holocaust deniers to try to systematically build legitimacy for revisionism.

Of course, they had no intent to give anyone $50,000. Image
Apr 3, 2023 4 tweets 1 min read
In 2033 it will seem utterly baffling how a bunch of tech folks lost their minds over text generators in 2023 -- like reading about Eliza or Minsky's 1970 quote about achieving human-level general intelligence by 1975 Or closer to the present -- like how people in 2016 predicted that RL applied to game environments would lead to AGI within 5-10 years
Mar 25, 2023 4 tweets 1 min read
Starting with the current Keras-nightly, `import keras` is becoming once again the standard way to import Keras (instead of `from tensorflow import keras`).

This will become the standard in the next release, 2.13.

The new `tf.keras` and `keras` namespaces are 100% identical. Vive l'émancipation :)
Mar 21, 2023 7 tweets 2 min read
Perhaps unsurprising -- at least if you're ever made a serious attempt at probing the system with *novel* questions

The distinction between intelligence and knowledge will keep getting more and more relevant over time Intelligence is the ability to acquire new skills in an information-efficient way, i.e. the ability to adapt and improvise in the face of uncertainty and novelty.

Intelligence is what you use when you don't *know* what to do.
Mar 18, 2023 5 tweets 1 min read
I just wrote up a quick notebook for the new Kaggle competition on "reading" carbonized papyrii from Pompeii via x-ray 3D scans. Really fun topic :)

kaggle.com/code/fchollet/… The model is pretty weak right now, but the end-to-end pipeline works. It trains on the whole dataset (a bit of a challenge given the memory constrained and the fact that streaming isn't an option!) and makes a submission. It uses TFData for the pipeline and a Keras U-Net.
Mar 15, 2023 6 tweets 2 min read
I'm also curious to see this. GPT-3 scored ~0 on ARC.

I'd expect GPT-4 to at least solve the tasks that are analogous to common IQ problems (i.e. the trivial subset of the training set). That said, doubt it could do anything with the (more novel) evaluation test. Always keep in mind, though: GPT-3 and GPT-4 were trained on the public ARC tasks and their solutions. The tasks are distributed as JSON files part of a public GitHub repo, which is of course part of the training data.

This is exactly why the *test set* is fully private.