François Chollet Profile picture
Jan 8, 2023 18 tweets 4 min read Read on X
The current climate in AI has so many parallels to 2021 web3 it's making me uncomfortable. Narratives based on zero data are accepted as self-evident. Everyone is expecting as a sure thing "civilization-altering" impact (& 100x returns on investment) in the next 2-3 years
Personally I think there's a bull case and bear case. The bull case is way way more conservative than what the median person on my TL considers as completely self-evident. And the actual outcome we'll see is statistically likely to lie in between, somewhat closer to the bear case
The bull case is that generative AI becomes a widespread UX paradigm for interacting with most tech products (note: this has nothing to do with AGI, which is a pipe dream). Near-future iterations of current AI models become our interface to the world's information.
The bear case is the continuation of the GPT-3 trajectory, which is that LLMs only find limited commercial success in SEO, marketing, and copywriting niches, while image generation (much more successful) peaks as a XB/y industry circa 2024. LLMs will have been a complete bubble.
So far there is *far* more evidence towards the bear case, and hardly any towards the bull case. *But* I think we're still very far from peak LLM performance at this time -- these models will improve tremendously in the next few years, both in output and in cost.
For this reason I believe the actual outcome we'll see is somewhere between the two scenarios. "AI as our universal interface to information" is a thing that will definitely happen in the future (it was always going to), but it won't quite happen with this generation of the tech.
Crucially, any sufficiently successful scenario has its own returns-defeating mechanism built-in: commoditization. *If* LLMs are capable of generating outsized economic returns, the tech will get commoditized. It will become a feature in a bunch of products, built with OSS.
As far as we know OpenAI made something like 5-10M in 2021 (1.5 years after GPT-3) and 30-40M in 2022. Only image generation has proven to be a solid commercial success at this time, and there aren't that many successful players in the space. Make of that what you will.
One thing I've found endlessly fascinating is to search Twitter for the most popular ChatGPT tweets, to gain insight into popular use cases. These tweets fall overwhelmingly into one category (like 80%). Can you guess what that is?
That's right, it's SEO/marketing engagement bait. ChatGPT has completely revolutionized the engagement bait tweet routine in these niches.

Some of it directly monetized (pay to unlock 10 ChatGPT secrets!), most of it is just trying to collect eyeballs. ImageImage
Now, seeing such tweets is compatible with both the bull case and the bear case. If the tech is revolutionary, it *will* be used in this way. What's interesting to me is that ~80% of ChatGPT tweets with >2000 likes fall into this category.
This is consistent with the primary learning from the 2020-2021 class of GPT-3 startups (a category of startups willed into existence by VCs and powered by hype), which is that commercial use cases have been falling almost entirely into the marketing and copywriting niches
I think the actual potential of ChatGPT goes significantly further than that, though. It will likely find success in consumer products, and perhaps even in education and search.
Whatever happens, we will know soon enough. Billions of dollars are being scrambled to deploy ChatGPT or similar technology into a large number of products. By the end of the year we will have enough data to make a call.
Anyway, hype aside, I really believe there's a ton of cool stuff you can build with deep learning today. That was true 5 years ago, it's true today, and it will still be true 5 years from now. The tech is super valuable, even if it attracts a particularly extreme form of hype men
One last thought -- don't overindex on the web3 <> LLMs comparison. Of course web3 was pure hot air while LLMs is real tech with actual applications -- that's not the parallel I'm making. The parallel is in the bubble formation social dynamics, especially in the VC crowd.
The fact that investment is being driven by pure hype, by data-free narratives rather than actual revenue data or first-principles analysis. The circularity of it all -- hype drives investment which drives hype which drives investment. The influx of influencer engagement bait.
Most of all, the way that narratives backed by nothing somehow end up enshrined as self-evident common wisdom simply because they get repeated enough times by enough people. The way everyone starts believing the same canon (especially those who bill themselves as contrarians)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with François Chollet

François Chollet Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @fchollet

Mar 31
That memorization (which ML has solely focused on) is not intelligence. And because any task that does not involve significant novelty and uncertainty can be solved via memorization, *skill* is never a sign of intelligence, no matter the task.
Intelligence is found in the ability to pick up new skills quickly & efficiently -- at tasks you weren't prepared for. To improvise, adapt and learn.
Here's a paper you can read about it.

It introduced a formal definition of intelligence, as well as benchmark to capture that definition in practical terms. Although it was developed before the rise of LLMs, current state-of-the-art LLMs such as Gemini Ultra, Claude 3, or GPT-4 are not able to score higher than a few percents on that benchmark.arxiv.org/abs/1911.01547
Read 4 tweets
Mar 13
We benchmarked a range of popular models (SegmentAnything, BERT, StableDiffusion, Gemma, Mistral) with all Keras 3 backends (JAX/TF/PT). Key findings:

1. There's no "best" backend. The fastest backend often depends on your specific model architecture.

2. Keras 3 with the right backend is consistently a lot faster than reference PT (compiled) implementations. Often by 150%+.

3. Keras 3 models are fast without requiring any custom performance optimizations. It's all "stock" code.

4. Keras 3 is faster than Keras 2.

Details here: keras.io/getting_starte…
Finding 1: the fastest backend for a given model typically alternates between XLA-compiled JAX and XLA-compiled TF. Plus, you might want to debug/prototype in PT before training/inferencing with JAX or TF.

The ability to write framework-agnostic models and pick your backend later is a game-changer.Image
Finding 2: Keras 3 with the best-performing backend outperforms reference native PT implementations (compiled) for all models we tried.

Notably, 5 out of 10 tasks demonstrate speedups exceeding 100%, with a maximum speedup of 340%.

If you're not leveraging this advantage for any large model training run, you're wasting GPU time -- and thus throwing away money.Image
Read 6 tweets
Mar 12
It doesn't take a whole lot of pondering to figure out that the thesis "humans only seem smart because they're 'trained' on huge amounts of 'data' via their visual system (almost like LLMs!)" doesn't hold any water.

For instance -- congenitally blind people are not less intelligent. Vision isn't fundamental to what makes us human. A rich learning environment is still a rich learning environment when apprehended through restricted sensorimotor modalities.
Humans span an incredibly wide range of sensorimotor affordances. Some are blind, some are deaf, some don't have hands. They might grow up in radically different environments -- some with just three other humans around them, some with thousands. Some with libraries of books, some without any writing.

In the end, though, it doesn't make a huge difference -- all of them become fully-fledged, intelligent humans. Because no matter what, they're all extracting information from the world at a roughly constant rate: the intrinsic rate at which the brain processes information. Which is an infinitesimal fraction of the bandwidth of the human sensorimotor feed.

If your senses are missing something, you'll just report your fixed-rate attention to something else, and won't be much poorer for it.
That's also why the influence of genes on fluid intelligence is overwhelmingly greater than that of the environment. If "training data" was so important, you'd expect environment and education to be critical to intelligence. They aren't. Twins raised in vastly different situations end up about as smart.
Read 4 tweets
Feb 21
Thread: quick API overview of Gemma, the new open-source LLM by Google.

First, let's make sure you have the latest Keras and KerasNLP installed, and let's set up your Kaggle credentials, so you can download the assets from Kaggle. Image
Next, let's instantiate the model and generate some text. You have access to 2 different sizes, 2B & 7B, and 2 different versions per size: base & instruction-tuned.

The first call will download the weights. Image
I generally recommend running inference in float16 or bfloat16 (depending on the hardware you're using). You can either globally configure the dtype policy in Keras (do it before creating the model), or pass the `dtype` argument to your model.

Note that operations like softmax will use float32 regardless, for stability.Image
Read 9 tweets
Feb 17
The "aha" moment when I realized that curve-fitting was the wrong paradigm for achieving generalizable modeling of problems spaces that involve symbolic reasoning was in early 2016.

I was trying every possible way to get a LSTM/GRU based model to classify first-order logic statements, and each new attempt was showing a bit more clearly than the last that my models were completely unable to learn to perform actual first-order logic -- despite the fact that this ability was definitely part of the representable function space. Instead, the models would inevitably latch onto statistical keyword associations to make their predictions.

It has been fascinating to see this observation echo again and again over the past 8 years.
From 2013 to 2016 I was actually quite convinced that RNNs could be trained to learn any program. After all, they're Turing-complete (or at least some of them are) and they learn a highly compressed model of the input:output mapping they're trained on (rather than mere pointwise associations). Surely they could perform symbolic program synthesis in some continuous latent program space?

Nope. They do in fact learn mere pointwise associations and completely useless for program synthesis. The problem isn't with what the function space can represent -- the problem is the learning process. It's SGD.
Ironically, Transformers are even worse in that regard -- mostly due to their strongly interpolative architecture prior. Multi-head-attention literally hardcodes sample interpolation in latent space. Also, the fact that recurrence is a really helpful prior for symbolic programs.
Read 6 tweets
Feb 17
Video generation models and Neural Radiance Fields have been improving regularly since 2016, and now they're in the spotlight. As a result there's a been a lot of debate about whether such systems embed a *model of physics*. Let's take a look...
These systems are capable of making next-frame visual predictions about a given physical situation might evolve. So they do have a model of physics.

The real questions are, is this model accurate? Is it capable of generalizing to novel situations, that aren't interpolations of what the model has been trained on?
These are not idle questions. They're the difference between two entirely different worlds of possibilities. In one world, generated imagery is limited to media production, to be consumed by humans. It's good to fool you into believing it looks real, but it doesn't actually look like reality would have. In the other world, generated imagery can be used as a simulation of reality, to make reliable predictions about the world and the future. It can be used for science.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(