Epoch AI Profile picture
Investigating the trajectory of AI for the benefit of society.
Apr 27 8 tweets 2 min read
How fast could production of humanoids, quadrupeds, drones, and other robots scale up, in the event of a large demand shock? Image We first look at current production trends. Humanoids are growing fastest (~16K units in 2025, doubling every ~6 months) but from a small base.

Quadrupeds (~81K) double every ~10 months. Drones (16M/year) and wheeled robots (33M/year) dominate in volume but grow more slowly. Image
Apr 10 5 tweets 2 min read
The Iran War and Hormuz shutdown have disrupted oil, gas, and helium exports and threatened data centers and investments in the Gulf states.

@justjoshinyou13 explores how a prolonged Iran war could affect AI, and why it probably won’t completely derail the compute buildout. Image Fabrication of AI chips and memory is concentrated in Taiwan and South Korea. These fabs rely on energy from natural gas as well as helium, both disrupted by the Hormuz closure.

But chip fabs are so profitable that TSMC and others will likely secure the resources they need.
Apr 6 6 tweets 3 min read
Compute may be the most important input to AI. So who owns the world’s AI compute?

Introducing our new AI Chip Owners explorer, showing our analysis of how leading AI chips are distributed among hyperscalers and other major players, broken down by chip type over time. Image To estimate global compute ownership, we build on our previous estimates of overall AI chip sales. We then use earnings commentary from chipmakers and hyperscalers, as well as media reports and industry researcher estimates, to allocate chips across owners. Image
Feb 26 8 tweets 3 min read
Developing more powerful AI isn’t just about scaling compute. It’s also about improving algorithms and data quality, which let you build better models with the same compute.

We call this “AI software progress” — here’s everything you need to know about it: 🧵 Image There are many ways to improve algorithms and data. For example, you could change model architectures, build better RL environments, and improve training recipes.

But how do you concretize what makes some AI software better than others?
Feb 26 8 tweets 3 min read
In 2024, @EpochAIResearch estimated the rate of software progress in language models. We found that training compute efficiency was improving at ~3x per year.

But this estimate was for pre-training, and is now outdated — so @ansonwhho took a new look at the numbers. 🧵 Image Almost all existing estimates suggest very fast progress, on the order of several times per year, though the uncertainty intervals are really wide.

Still, it’s very possible that training efficiency improves much faster than 3× per year. Even 10× per year seems possible! Image
Feb 26 12 tweets 4 min read
AI training compute efficiency has improved extremely fast: each year, you need several times less training compute to reach the same capability.

But AI architectures/algorithms haven’t changed *that* much in recent years.

So where do these efficiency improvements come from? 🧵 Image One explanation is that these improvements came not from better algorithms, but better data.

For example, training has shifted from uncurated web data to heavily processed (and often synthetic) data. AI companies are also spending billions on data, like RL environments:
Jan 28 8 tweets 2 min read
Was serving GPT-5 profitable?

According to @Jsevillamol, @exponentialview’s Hannah Petrovic, and @ansonwhho, it depends. Gross margins were around 45%, making inference look profitable.

But after accounting for the cost of operations, OpenAI likely incurred a loss.🧵 Image Even the gross profits from running models weren’t enough to recoup R&D costs.

Gross profits running GPT-5 were less than OpenAI's R&D costs in the four months before launch. And the true R&D cost was likely higher than that. Image
Jan 8 4 tweets 2 min read
Global AI compute capacity now totals over 15 million H100-equivalents.

Our new AI Chip Sales data explorer tracks where this compute comes from across Nvidia, Google, Amazon, AMD, and Huawei, making it the most comprehensive public dataset available. Image Nvidia’s B300 GPU now accounts for the majority of its revenue from AI chips, while H100s make up under 10%.

We estimate chip-level spending using earnings reports, company disclosures, and analyst and media coverage. Image
Dec 12, 2025 4 tweets 2 min read
GPT-5.2 scores 152 on the Epoch Capabilities Index (ECI), our tool for aggregating benchmark scores. This puts it second only to Gemini 3 Pro.

🧵 with individual scores. Image GPT-5.2 ranks first or second on most of the benchmarks we run ourselves, including a top score on FrontierMath Tiers 1–3 and our new chess puzzles benchmark. The exception is SimpleQA Verified, where it scores notably worse than even previous GPT-5 series models. Image
Nov 10, 2025 11 tweets 3 min read
AI data center buildouts already rival the Manhattan Project in scale, but there’s little public info about them.

So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources.

Here’s what you need to know. 🧵 Image AI data centers will be some of the biggest infrastructure projects in history

e.g. OpenAI’s Stargate Abilene will need:

- As much power as Seattle (1 GW)

- >250× the compute of the GPT-4 cluster

- 450 soccer fields of land

- $32B

- Thousands of workers

- 2 years to build
Nov 7, 2025 5 tweets 2 min read
The Epoch Capabilities Index is a useful way to measure model capabilities, but what does a score of 150 actually mean?

One way to read our new capability index is by plotting the benchmark performance you expect to see, for a range of ECI scores 🧵 Image Three important takeaways:

1. Benchmarks vary in overall difficulty, and in slope. Steeper slopes imply a narrower range of difficulties at the question level and mean the benchmark saturates quickly once some progress is made.
Nov 4, 2025 8 tweets 3 min read
Announcing our Frontier Data Centers Hub!

The world is about to see multiple 1 GW+ AI data centers.

We mapped their construction using satellite imagery, permits & public sources — releasing everything for free, including commissioned satellite images.

Highlights in thread! Image Several data centers will soon demand 1 GW of power, starting early next year:

- Anthropic–Amazon New Carlisle (January)
- xAI Colossus 2 (February)
- Microsoft Fayetteville (March, borderline 1GW)
- Meta Prometheus (May)
- OpenAI Stargate Abilene (July) Image
Oct 9, 2025 10 tweets 3 min read
We evaluated Gemini 2.5 Deep Think on FrontierMath. There is no API, so we ran it manually. The results: a new record!

We also conducted a more holistic evaluation of its math capabilities. 🧵 Image Note that this is the publicly available version of Deep Think, not the version that achieved a gold medal-equivalent score on the IMO. Google has described the publicly available Deep Think model as a “variation” of the IMO gold model.
Oct 3, 2025 5 tweets 2 min read
Sora 2 can solve questions from LLM benchmarks, despite being a video model.

We tested Sora 2 on a small subset of GPQA questions, and it scored 55%, compared to GPT-5’s score of 72%. GPQA Diamond is a benchmark of challenging multiple-choice science questions, like the attached example. We randomly selected 10 questions from the benchmark, and tried running Sora on them until we generated four videos per question. Image
Sep 30, 2025 7 tweets 3 min read
Announcing our new AI Companies Data Hub!

We collected key data on frontier AI companies, including revenue run rates, funding, staff, usage rates, and compute spend.

This free resource will help you understand the trajectory and economics of AI.

Highlights in thread! Image Revenue:

The combined revenue rates of OpenAI and Anthropic have grown around 10x since early 2024.

OpenAI’s annualized revenue reached $13 billion in August 2025, up from $5B at the start of the year.

Anthropic’s revenue has exploded this year, from $1B to $5B by July! Image
Sep 26, 2025 11 tweets 2 min read
Why did OpenAI train GPT-5 with less compute than GPT-4.5?

Due to the higher returns to post-training, they scaled post-training as much as possible on a smaller model

And since post-training started from a much lower base, this meant a decrease in total training FLOP 🧵 Image The invention of reasoning models made it possible to greatly improve performance by scaling up post-training compute. This improvement is so great that GPT-5 outperforms GPT-4.5 despite having used less training compute overall.
Sep 16, 2025 12 tweets 4 min read
What will AI look like by 2030 if current trends hold?

Our new report zooms in on two things: (1) whether scaling continues (compute, data, power, capital), and (2) the capabilities this enables—especially for scientific R&D. Image We forecast that by 2030:
- Training clusters would cost hundreds of billions of dollars
- Compute scaling is probably not "hitting a wall"
- Synthetic & multimodal data may be needed to ease bottlenecks
- Power demands will increase but be manageable in principle Image
Image
Image
Image
Sep 5, 2025 9 tweets 3 min read
AI progress has been driven by enormous compute scaling, but this is likely to slow down within the next few years. The reasons: investor uncertainty, the heavy costs of overinvestment, and increasing lead times. 🧵 Image Investors are incredibly uncertain about the returns to further scaling, and overestimating the returns could cost them >$100B. So rather than going all-in today, they invest more gradually, observing the returns from incremental scaling, before reevaluating further investment.
Aug 12, 2025 7 tweets 3 min read
We’ve independently evaluated the GPT-5 model family on our benchmarking suite. Here is what we’ve learned 🧵 Image GPT-5 performs strongly on math benchmarks, achieving a new SOTA on FrontierMath and OTIS Mock AIME 2024-2025. Image
Aug 11, 2025 8 tweets 2 min read
The power required to train frontier AI models has been growing exponentially over time. What happens if trends continue?

In a new white paper written in collaboration with @EPRINews, we analyze this question and forecast multi-gigawatt individual training runs by 2030!

🧵 Image @EPRINews Power demands for frontier AI training have been growing at 2.2x per year, with frontier runs now exceeding 100 MW. The primary factor driving this growth is the scaling of the compute used to train models, at a rate of 4-5x per year.
Aug 8, 2025 9 tweets 2 min read
OpenAI has historically scaled up training compute by around 100x with each new generation of its GPT.

However, GPT-5 appears to be an exception to this trend.

🧵 GPT-4 was trained on 2e25 floating-point operations, and OpenAI said GPT-4.5 was about an order-of-magnitude (10x) scale-up.

We don’t have a rigorous estimate yet, but GPT-5’s compute scale may be *between* GPT-4 and GPT-4.5, and it is probably not a large scale-up from 4.5.