Back in 2022, ChatGPT arguably became the fastest-growing consumer app ever, hitting 100M users in just 2 months. But the field of AI has transformed since then, and it’s time to take a new look at the numbers. 🧵
Historically, technology adoption took decades. For example, telephones took 60 years to reach 70% of US households. But tech diffuses faster and faster over time, and we should expect AI to continue this trend.
But even if we account for this trend, AI adoption seems incredibly fast. ~10% of the US used ChatGPT weekly within just 2 years, and ~30% in under 2.5 years.
It’s not just ChatGPT. OpenAI, Anthropic, and DeepMind revenues have collectively grown by >$10B since ChatGPT’s release. Furthermore, almost ~40% of US businesses are now paying for AI tools, and this will reach ~80% by 2028 on the current trajectory.
These numbers suggest that AI systems reached the current number of users incredibly quickly, faster than almost any previous technology.
Besides the number of users, to understand the rate of AI diffusion we also need to look at how and how much AI systems are being used. Are people using frontier models more? Are they using them more intensively?
Firstly, ~95% of ChatGPT users are on the free-tier, with limited access to frontier AI. In contrast, paying users quickly adopt the best models: On OpenRouter, nearly all token usage of Claude models shifts to the latest models <2 months after release.
But despite rapid total user growth, the fraction of paid ChatGPT users hasn’t grown. If anything, it’s been declining: paid users grew ~3.3x from Jan 2024 to Apr 2025, but total users increased ~4.5x. That’s evidence against increased usage intensity.
Survey data gives mixed evidence. A Pew survey found no changes in AI interaction frequency between 2022 and 2024, whereas a Gallup poll found frequent use nearly doubled from 11% to 19% (2023-2025), though mostly among white-collar workers.
On the other hand, token usage per user has likely grown a lot. Sam Altman reported a 50x increase in OpenAI’s token volume between Nov 2023 and Oct 2024. Adjusting for user growth, that could mean up to ~20x more tokens per user.
Taking everything into account, there have likely also been substantial increases in how much individuals use AI since ChatGPT’s release, though the evidence is somewhat tricky to interpret.
We have graded the results of @OpenAI's evaluation on FrontierMath Tier 1–3 questions, and found a 27% (± 3%) performance. ChatGPT agent is a new model fine-tuned for agentic tasks, equipped with text/GUI browser tools and native terminal access. 🧵
This evaluation is not directly comparable to those on Epoch AI’s benchmarking hub, as it uses a different scaffold. First, we did not run the model ourselves—we only graded the outputs provided by OpenAI and don’t have access to their code to run the model. Second, ChatGPT agent has access to tools not available to other models we've assessed—most notably browser tools, which may have helped on questions related to recent research papers. Finally, the evaluation allowed up to 128K tokens per question, compared to our standard 100K; this difference is unlikely to have significantly affected results.
@OpenAI OpenAI has exclusive access to all FrontierMath problem statements and 237 of the 290 Tier 1–3 solutions. Epoch AI holds out the remaining solutions. We found no statistically significant performance difference between the held-out and non-held-out sets.
The IMO is next week. What will it tell us about AI?
@GregHBurnham argues that an AI gold medal could be a non-event or could be an important breakthrough—it depends on whether the AI system exhibits creative problem-solving. How to tell the difference? Read on!
@GregHBurnham It will be tempting to focus on whether an AI system gets a gold medal. Formal proof systems like Google’s AlphaProof are quite close to this, and even general-purpose LLMs have a fighting chance. But that's not the outcome to pay the most attention to.
@GregHBurnham Rather, the big thing to watch for is qualitative: can AI systems solve problems that require a lot of creativity?
@ansonwhho and @ardenaberg argue that if one reaches the scale of previous national projects, an AI Manhattan project could result in a ~1000x compute scaleup by 2027.
@ansonwhho @ardenaberg A national AI project has become more and more of a possibility in the last year, with one as the top recommendation from a US-China congressional commission.
@ansonwhho @ardenaberg Previous national projects at their peaks spent an equivalent fraction of GDP as $120B-$250B today. The authors find that such a budget could centralize most NVIDIA compute in the US.
- The number of large-scale model releases is growing rapidly (418 models over 10^23 FLOP)
- The UK has fallen behind, China has caught up (9 vs 151 models)
- There are far more of the largest models (33 models over 10^25 FLOP)
First, the number of large-scale model releases is growing rapidly.
In 2020, there were 4 models trained with more than 10^23 FLOP.
By the end of 2024, there were 327 such models in our dataset.
Most large-scale models — those trained on over 10^23 FLOP — are language models.
Of the 418 large-scale models in our data, 326 are language models, of which 86 are vision-language (like GPT-4).
LLM context windows have grown, but can models really use all this content?
We find signs of recent, rapid progress in their ability to do so. Read on to learn more!
From Claude 2.0’s 100k tokens in 2023 to Llama 4 Maverick’s 10M earlier this year, there’s no doubt that context windows are getting longer. On a set of models from Artificial Analysis, we find that the longest available context windows have grown at about 30x/year.
But, how effectively can models use these longer windows? We measured the input lengths at which models score above 80% on two moderately-challenging long-context benchmarks, Fiction.liveBench and MRCR (2-needle).
The bottlenecks to >10% GDP growth are weaker than expected, and existing $500B investments in Stargate may be tiny relative to optimal AI investment
In this week’s Gradient Update, @APotlogea and @ansonwhho explain how their work on the economics of AI brought them to this view
@APotlogea @ansonwhho Skepticism around explosive AI growth often hinges on "Baumol effects"—bottlenecks from human-dependent tasks. But to their surprise, the most comprehensive integrated assessment model of AI to date suggests these constraints are weaker than expected
@APotlogea @ansonwhho Contrary to their expectations, even very partial AI automation—just 30% of tasks—can lead to growth rates above 20% under best-guess parameters. Achieving explosive growth (>30%) requires around 50-70% automation, still well below full automation