Post

More from @EpochAIResearch

Epoch AI

@EpochAIResearch

Apr 6

Compute may be the most important input to AI. So who owns the world’s AI compute?

Introducing our new AI Chip Owners explorer, showing our analysis of how leading AI chips are distributed among hyperscalers and other major players, broken down by chip type over time.

To estimate global compute ownership, we build on our previous estimates of overall AI chip sales. We then use earnings commentary from chipmakers and hyperscalers, as well as media reports and industry researcher estimates, to allocate chips across owners.

We estimate that over 60% of global AI compute is owned by the top US hyperscalers, led by Google with the equivalent of roughly 5 million Nvidia H100 GPUs!

Unlike the other hyperscalers, which rely primarily on Nvidia, Google’s fleet is dominated by its custom TPU chips.

Read 6 tweets

Epoch AI

@EpochAIResearch

Feb 26

Developing more powerful AI isn’t just about scaling compute. It’s also about improving algorithms and data quality, which let you build better models with the same compute.

We call this “AI software progress” — here’s everything you need to know about it: 🧵

There are many ways to improve algorithms and data. For example, you could change model architectures, build better RL environments, and improve training recipes.

But how do you concretize what makes some AI software better than others?

One way is to say that better AI software reduces the compute needed to reach the same capability.

For example, imagine a curve relating a measure of capabilities to log(training compute). After making an algorithmic innovation, the curve shifts to the left, saving compute:

Read 8 tweets

Epoch AI

@EpochAIResearch

Feb 26

In 2024, @EpochAIResearch estimated the rate of software progress in language models. We found that training compute efficiency was improving at ~3x per year.

But this estimate was for pre-training, and is now outdated — so @ansonwhho took a new look at the numbers. 🧵

Almost all existing estimates suggest very fast progress, on the order of several times per year, though the uncertainty intervals are really wide.

Still, it’s very possible that training efficiency improves much faster than 3× per year. Even 10× per year seems possible!

The numbers are very uncertain for two reasons.

1. They’re based on limited data, because we lack long-run time series with both model performance and training compute, which we need to derive estimates of software progress.

Read 8 tweets

Epoch AI

@EpochAIResearch

Feb 26

AI training compute efficiency has improved extremely fast: each year, you need several times less training compute to reach the same capability.

But AI architectures/algorithms haven’t changed *that* much in recent years.

So where do these efficiency improvements come from? 🧵

https://twitter.com/1529761561170124800/status/2010814868832887163

One explanation is that these improvements came not from better algorithms, but better data.

For example, training has shifted from uncurated web data to heavily processed (and often synthetic) data. AI companies are also spending billions on data, like RL environments:

https://twitter.com/1529761561170124800/status/2010814868832887163

Another explanation is that measured efficiency gains came from innovations that depend on training compute scale.

Here’s the idea: most existing estimates assume that innovations are scale-independent. This means shifting scaling curves in parallel to the left…

Read 12 tweets

Epoch AI

@EpochAIResearch

Jan 28

Was serving GPT-5 profitable?

According to @Jsevillamol, @exponentialview’s Hannah Petrovic, and @ansonwhho, it depends. Gross margins were around 45%, making inference look profitable.

But after accounting for the cost of operations, OpenAI likely incurred a loss.🧵

Even the gross profits from running models weren’t enough to recoup R&D costs.

Gross profits running GPT-5 were less than OpenAI's R&D costs in the four months before launch. And the true R&D cost was likely higher than that.

The core problem: AI R&D is expensive, and model lifecycles are too short to get enough revenue.

So even if it’s profitable to run models, the full lifecycle is likely loss-making — as long as GPT-5 is representative of other models.

Read 8 tweets

Epoch AI

@EpochAIResearch

Jan 8

Global AI compute capacity now totals over 15 million H100-equivalents.

Our new AI Chip Sales data explorer tracks where this compute comes from across Nvidia, Google, Amazon, AMD, and Huawei, making it the most comprehensive public dataset available.

Nvidia’s B300 GPU now accounts for the majority of its revenue from AI chips, while H100s make up under 10%.

We estimate chip-level spending using earnings reports, company disclosures, and analyst and media coverage.

These chips present massive resource demands.

Even before the power overheads of servers and data centers, this many chips would draw over 10 GW of power - around twice the average power consumption of New York City.

Read 4 tweets

Share this page!

Enter URL or ID to Unroll

Epoch AI

Try unrolling a thread yourself!

More from @EpochAIResearch

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!