Post

More from @EpochAIResearch

Epoch AI

@EpochAIResearch

Feb 26

Developing more powerful AI isn’t just about scaling compute. It’s also about improving algorithms and data quality, which let you build better models with the same compute.

We call this “AI software progress” — here’s everything you need to know about it: 🧵

There are many ways to improve algorithms and data. For example, you could change model architectures, build better RL environments, and improve training recipes.

But how do you concretize what makes some AI software better than others?

One way is to say that better AI software reduces the compute needed to reach the same capability.

For example, imagine a curve relating a measure of capabilities to log(training compute). After making an algorithmic innovation, the curve shifts to the left, saving compute:

Read 8 tweets

Epoch AI

@EpochAIResearch

Feb 26

In 2024, @EpochAIResearch estimated the rate of software progress in language models. We found that training compute efficiency was improving at ~3x per year.

But this estimate was for pre-training, and is now outdated — so @ansonwhho took a new look at the numbers. 🧵

Almost all existing estimates suggest very fast progress, on the order of several times per year, though the uncertainty intervals are really wide.

Still, it’s very possible that training efficiency improves much faster than 3× per year. Even 10× per year seems possible!

The numbers are very uncertain for two reasons.

1. They’re based on limited data, because we lack long-run time series with both model performance and training compute, which we need to derive estimates of software progress.

Read 8 tweets

Epoch AI

@EpochAIResearch

Feb 26

AI training compute efficiency has improved extremely fast: each year, you need several times less training compute to reach the same capability.

But AI architectures/algorithms haven’t changed *that* much in recent years.

So where do these efficiency improvements come from? 🧵

https://twitter.com/1529761561170124800/status/2010814868832887163

One explanation is that these improvements came not from better algorithms, but better data.

For example, training has shifted from uncurated web data to heavily processed (and often synthetic) data. AI companies are also spending billions on data, like RL environments:

https://twitter.com/1529761561170124800/status/2010814868832887163

Another explanation is that measured efficiency gains came from innovations that depend on training compute scale.

Here’s the idea: most existing estimates assume that innovations are scale-independent. This means shifting scaling curves in parallel to the left…

Read 12 tweets

Epoch AI

@EpochAIResearch

Jan 28

Was serving GPT-5 profitable?

According to @Jsevillamol, @exponentialview’s Hannah Petrovic, and @ansonwhho, it depends. Gross margins were around 45%, making inference look profitable.

But after accounting for the cost of operations, OpenAI likely incurred a loss.🧵

Even the gross profits from running models weren’t enough to recoup R&D costs.

Gross profits running GPT-5 were less than OpenAI's R&D costs in the four months before launch. And the true R&D cost was likely higher than that.

The core problem: AI R&D is expensive, and model lifecycles are too short to get enough revenue.

So even if it’s profitable to run models, the full lifecycle is likely loss-making — as long as GPT-5 is representative of other models.

Read 8 tweets

Epoch AI

@EpochAIResearch

Jan 8

Global AI compute capacity now totals over 15 million H100-equivalents.

Our new AI Chip Sales data explorer tracks where this compute comes from across Nvidia, Google, Amazon, AMD, and Huawei, making it the most comprehensive public dataset available.

Nvidia’s B300 GPU now accounts for the majority of its revenue from AI chips, while H100s make up under 10%.

We estimate chip-level spending using earnings reports, company disclosures, and analyst and media coverage.

These chips present massive resource demands.

Even before the power overheads of servers and data centers, this many chips would draw over 10 GW of power - around twice the average power consumption of New York City.

Read 4 tweets

Epoch AI

@EpochAIResearch

Dec 12, 2025

GPT-5.2 scores 152 on the Epoch Capabilities Index (ECI), our tool for aggregating benchmark scores. This puts it second only to Gemini 3 Pro.

🧵 with individual scores.

GPT-5.2 ranks first or second on most of the benchmarks we run ourselves, including a top score on FrontierMath Tiers 1–3 and our new chess puzzles benchmark. The exception is SimpleQA Verified, where it scores notably worse than even previous GPT-5 series models.

Our AIME variant, OTIS Mock AIME 2024-2025, is nearly saturated. There remains a single problem no model has solved, shown below. The diagram is given to the model in the Asymptote vector graphics language.

Read 4 tweets

Share this page!

Enter URL or ID to Unroll

Epoch AI

Try unrolling a thread yourself!

More from @EpochAIResearch

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!