The current climate in AI has so many parallels to 2021 web3 it's making me uncomfortable. Narratives based on zero data are accepted as self-evident. Everyone is expecting as a sure thing "civilization-altering" impact (& 100x returns on investment) in the next 2-3 years
Personally I think there's a bull case and bear case. The bull case is way way more conservative than what the median person on my TL considers as completely self-evident. And the actual outcome we'll see is statistically likely to lie in between, somewhat closer to the bear case
The bull case is that generative AI becomes a widespread UX paradigm for interacting with most tech products (note: this has nothing to do with AGI, which is a pipe dream). Near-future iterations of current AI models become our interface to the world's information.
The bear case is the continuation of the GPT-3 trajectory, which is that LLMs only find limited commercial success in SEO, marketing, and copywriting niches, while image generation (much more successful) peaks as a XB/y industry circa 2024. LLMs will have been a complete bubble.
So far there is *far* more evidence towards the bear case, and hardly any towards the bull case. *But* I think we're still very far from peak LLM performance at this time -- these models will improve tremendously in the next few years, both in output and in cost.
For this reason I believe the actual outcome we'll see is somewhere between the two scenarios. "AI as our universal interface to information" is a thing that will definitely happen in the future (it was always going to), but it won't quite happen with this generation of the tech.
Crucially, any sufficiently successful scenario has its own returns-defeating mechanism built-in: commoditization. *If* LLMs are capable of generating outsized economic returns, the tech will get commoditized. It will become a feature in a bunch of products, built with OSS.
As far as we know OpenAI made something like 5-10M in 2021 (1.5 years after GPT-3) and 30-40M in 2022. Only image generation has proven to be a solid commercial success at this time, and there aren't that many successful players in the space. Make of that what you will.
One thing I've found endlessly fascinating is to search Twitter for the most popular ChatGPT tweets, to gain insight into popular use cases. These tweets fall overwhelmingly into one category (like 80%). Can you guess what that is?
That's right, it's SEO/marketing engagement bait. ChatGPT has completely revolutionized the engagement bait tweet routine in these niches.
Some of it directly monetized (pay to unlock 10 ChatGPT secrets!), most of it is just trying to collect eyeballs.
Now, seeing such tweets is compatible with both the bull case and the bear case. If the tech is revolutionary, it *will* be used in this way. What's interesting to me is that ~80% of ChatGPT tweets with >2000 likes fall into this category.
This is consistent with the primary learning from the 2020-2021 class of GPT-3 startups (a category of startups willed into existence by VCs and powered by hype), which is that commercial use cases have been falling almost entirely into the marketing and copywriting niches
I think the actual potential of ChatGPT goes significantly further than that, though. It will likely find success in consumer products, and perhaps even in education and search.
Whatever happens, we will know soon enough. Billions of dollars are being scrambled to deploy ChatGPT or similar technology into a large number of products. By the end of the year we will have enough data to make a call.
Anyway, hype aside, I really believe there's a ton of cool stuff you can build with deep learning today. That was true 5 years ago, it's true today, and it will still be true 5 years from now. The tech is super valuable, even if it attracts a particularly extreme form of hype men
One last thought -- don't overindex on the web3 <> LLMs comparison. Of course web3 was pure hot air while LLMs is real tech with actual applications -- that's not the parallel I'm making. The parallel is in the bubble formation social dynamics, especially in the VC crowd.
The fact that investment is being driven by pure hype, by data-free narratives rather than actual revenue data or first-principles analysis. The circularity of it all -- hype drives investment which drives hype which drives investment. The influx of influencer engagement bait.
Most of all, the way that narratives backed by nothing somehow end up enshrined as self-evident common wisdom simply because they get repeated enough times by enough people. The way everyone starts believing the same canon (especially those who bill themselves as contrarians)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I'm joining forces with @mikeknoop to start Ndea (@ndeainc), a new AI lab.
Our focus: deep learning-guided program synthesis. We're betting on a different path to build AI capable of true invention, adaptation, and innovation.
We're really excited about our current research direction. We believe we have a small but real chance of achieving a breakthrough -- creating AI that can learn at least as efficiently as people, and that can keep improving over time with no bottlenecks in sight.
People scaled LLMs by ~10,000x from 2019 to 2024, and their scores on ARC stayed near 0 (e.g. GPT-4o at ~5%). Meanwhile a very crude program search approach could score >20% with hardly any compute.
Then OpenAI started adding test-time CoT search. ARC scores immediately shot up.
It's not about scale. It's about working on the right ideas.
Like deep-learning guided CoT synthesis or program synthesis. Via search.
Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.
It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.
While the new model is very impressive and represents a big milestone on the way towards AGI, I don't believe this is AGI -- there's still a fair number of very easy ARC-AGI-1 tasks that o3 can't solve, and we have early indications that ARC-AGI-2 will remain extremely challenging for o3.
This shows that it's still feasible to create unsaturated, interesting benchmarks that are easy for humans, yet impossible for AI -- without involving specialist knowledge. We will have AGI when creating such evals becomes outright impossible.
When we develop AI systems that can actually reason, they will involve deep learning (as one of two major components, the other one being discrete search), and some people will say that this "proves" that DL can reason.
No, it will have proven the thesis that DL is not enough, and that we need to combine DL with discrete search.
From my DL textbook (1st edition), published in 2017. Seven years later, there is now overwhelming momentum towards this exact approach.
I find it especially obtuse when people point to progress on math benchmark as evidence of LLMs being AGI, given that all of this progress has been driven by methods that leverage discrete search. The empirical data is completely vindicating that DL in general, and LLMs in particular, can't do math on their own, and that we need discrete search.
In the last Trump administration, legal, high-skilled immigration was cut by ~30% before Covid, then by 100% after Covid (which was definitely a choice: a number of countries kept issuing residency permits and visas). However illegal immigrant inflows did not go down (they've been stable since the mid-2000s).
If you're a scientist or engineer applying for a green card, you're probably keenly aware that your chances of eventually obtaining it are highly dependent on the election. What you may not know is that, if you're a naturalized citizen, your US passport is also at stake
The last Trump administration launched a "denaturalization task force" aiming at taking away US citizenship from as many naturalized citizens as possible, with an eventual target of 7M (about one third of all naturalized citizens). Thankfully, they ran into a little problem: the courts.
When we say deep learning models operate via memorization, the claim isn't that they work like literal lookup tables, only being able to make sense of points that are exactly part of their training data. No one has claimed that -- it wouldn't even be true of linear regression.
Of course deep learning models can generalize to unseen data points -- they would be entirely useless if they couldn't. The claim is that they perform *local generalization*: generalization to known unknowns, to degrees of variability for which you can provide a dense sampling at training time.
If you take a problem that is known to be solvable by expert humans via pure pattern recognition (say, spotting the top move on a chess board) and that has been known to be solvable via convnets as far back as 2016, and you train a model on ~5B chess positions across ~10M games, and you find that the model can solve the problem at the level of a human expert, that isn't an example of out-of-distribution generalization. That is an example of local generalization -- precisely the thing you expect deep learning to be able to do.