How much AI compute exists globally? How rapidly is it growing?
We analyzed NVIDIA's GPU shipments since 2018 to answer these questions, and found that the installed computing power of NVIDIA chips has doubled every 10 months on average, since 2019.
To track installed compute, we used data from NVIDIA’s financial reports, along with a new dataset of over 700 AI data centers (forthcoming). This dataset enables us to see the relative quantities of each GPU model as they come into operation.
Total installed NVIDIA compute grew 2.3x per year since 2019. In comparison, training compute for frontier models has grown 4-5x per year over the same period, suggesting that compute scaling could eventually become bottlenecked by the slower (but still rapid!) pace of chip production.
Breaking it down by chip generation, we see that currently about 77% of all NVIDIA FLOP/s come from Hopper-generation GPUs like the H100. Currently, there are 4M H100-equivalents of operational NVIDIA chips in data centers around the world!
Because computing capacity is growing so quickly, hardware failures don’t have much impact. We estimate that only 7% of all installed computing power has depreciated due to hardware failures, though the number may be as much as 27% on more pessimistic assumptions.
What about TPUs? There’s less public data here, but our previous work found that the combined power of Google’s TPUs was about 30% of that from NVIDIA chips, as of mid-2024.
This research was sponsored by @aria_research. Learn more about it on our website! epoch.ai/data/machine-l…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
How much energy does a ChatGPT query consume? One common estimate is 3 watt-hours per query.
However, in this week’s Gradient Update we find that it’s probably about 10x less for a typical query today. 🧵
The main energy cost of a query comes from running the model (inference). We estimate the compute cost using GPT-4o (estimated 100B active parameters) as our reference, assuming a typical 500-token response (~400 words).
When running on H100 servers, even with pessimistic utilization rates, we conclude that a ChatGPT query with GPT-4o consumes roughly 0.3 watt-hours.
This is well below the most widely-cited estimate of 3 watt-hours, which is based on outdated assumptions.
We’re excited to announce a major update to the Epoch AI Benchmarking Hub!
The Benchmarking Hub hosts our independent evaluations of AI models. This latest release overhauls how we run and share AI benchmarks—making the data more transparent, systematic, and up to date. 🧵
What’s new for you?
• Richer Data: See comprehensive details on each evaluation and the model behind it.
• More Frequent Updates: Expect fresh benchmarking results soon after new models launch.
This major update also enables three key features:
1️⃣ Transparency: We provide not only average scores, but also the prompts, model outputs, and scoring.
In a new Gradient Update, @MatthewJBar analyzes the impact of AGI on human wages. He concludes that if AGI can fully substitute for human labor, it might cause wages to crash. Eventually, wages may drop below subsistence level—a minimum level required for human survival.🧵
His argument is based on the idea of diminishing returns to labor. In a simple model of production, increasing labor decreases wages, all else being equal. While simplistic, this argument suggests that if AGIs are scaled much faster than physical capital, human wages will crash.
However, since physical capital can be scaled up alongside AGIs, it is necessary to examine a more general version of this argument. In the general case, the future evolution of wages will depend heavily on the returns to scale in production.
When will an open-weight AI model be trained with 1e26 FLOP?
The Biden administration's new AI export restrictions regulate models above 1e26 FLOP… unless there’s an open-weight model that exceeds this. When will that happen, and how fast will the threshold increase? 🧵
We looked at open-weight models in our dataset of notable models, and identified releases that pushed forward the frontier of open-weight training compute. These “top-1” models have historically grown in compute by 4.7x per year.
Projecting forward, the trend suggests we may see 1e26 FLOP open-weight models in 2025. There’s reason to think it may be even sooner than our forecast of November – we know Meta has been training Llama 4 on over 100k H100s, since at least October 2024.
What would happen if remote work were fully automated? In a new Gradient Updates issue, @MatthewJBar argues the economic impact would be massive—with the economy doubling in size even in the most conservative scenario. 🧵
@MatthewJBar Using GPT-4o to analyze tasks in the O*NET database, Matthew finds that 34% of work in the US economy can be performed remotely. This is contrasted with prior research, revealing an interesting discrepancy with a major existing study.
@MatthewJBar Next, Matthew uses data on remote work from the pandemic era to estimate how substitutable non-remote tasks are for remote tasks. This analysis reveals that, despite a huge rise in remote work during 2020, GDP only fell modestly, indicating high substitutability.
The amount of compute used to train frontier models has been growing at a breakneck pace of over 4x per year since 2018, resulting in an overall scale-up of more than 10,000x! But what factors are enabling this rapid growth? 🧵 1/6
We decompose training compute into three constituent factors: quantity of training hardware, the computing power of that hardware in FLOP per second, and the amount of time spent training. We fit trends on each of these underlying factors. 2/6
Looking at frontier models since 2018, we find that increases in hardware cluster sizes have been the most important contributor to larger training runs, growing by around 1.7x per year and making up about 40% of the growth in compute. 3/6