The Iran War and Hormuz shutdown have disrupted oil, gas, and helium exports and threatened data centers and investments in the Gulf states.
@justjoshinyou13 explores how a prolonged Iran war could affect AI, and why it probably won’t completely derail the compute buildout.
Fabrication of AI chips and memory is concentrated in Taiwan and South Korea. These fabs rely on energy from natural gas as well as helium, both disrupted by the Hormuz closure.
But chip fabs are so profitable that TSMC and others will likely secure the resources they need.
For AI data centers, the Hormuz energy shock is not a serious threat in the US, where natural gas prices have been stable.
In Europe and Asia, higher costs may kill some planned data centers, but existing data centers will keep running unless prices surge to much higher levels.
The most serious impacts might occur in the Gulf monarchies. Iran has threatened and directly attacked data centers, which could affect planned projects like Stargate UAE.
Perhaps more importantly, the shock to oil exports could cut off Gulf capital flows to AI, including upcoming IPOs.
This Gradient Update was written by @justjoshinyou13. All Gradient Updates are informal and opinionated analyses that represent the views of individual authors, not Epoch AI as a whole.
Compute may be the most important input to AI. So who owns the world’s AI compute?
Introducing our new AI Chip Owners explorer, showing our analysis of how leading AI chips are distributed among hyperscalers and other major players, broken down by chip type over time.
To estimate global compute ownership, we build on our previous estimates of overall AI chip sales. We then use earnings commentary from chipmakers and hyperscalers, as well as media reports and industry researcher estimates, to allocate chips across owners.
We estimate that over 60% of global AI compute is owned by the top US hyperscalers, led by Google with the equivalent of roughly 5 million Nvidia H100 GPUs!
Unlike the other hyperscalers, which rely primarily on Nvidia, Google’s fleet is dominated by its custom TPU chips.
Developing more powerful AI isn’t just about scaling compute. It’s also about improving algorithms and data quality, which let you build better models with the same compute.
We call this “AI software progress” — here’s everything you need to know about it: 🧵
There are many ways to improve algorithms and data. For example, you could change model architectures, build better RL environments, and improve training recipes.
But how do you concretize what makes some AI software better than others?
One way is to say that better AI software reduces the compute needed to reach the same capability.
For example, imagine a curve relating a measure of capabilities to log(training compute). After making an algorithmic innovation, the curve shifts to the left, saving compute:
In 2024, @EpochAIResearch estimated the rate of software progress in language models. We found that training compute efficiency was improving at ~3x per year.
But this estimate was for pre-training, and is now outdated — so @ansonwhho took a new look at the numbers. 🧵
Almost all existing estimates suggest very fast progress, on the order of several times per year, though the uncertainty intervals are really wide.
Still, it’s very possible that training efficiency improves much faster than 3× per year. Even 10× per year seems possible!
The numbers are very uncertain for two reasons.
1. They’re based on limited data, because we lack long-run time series with both model performance and training compute, which we need to derive estimates of software progress.
AI training compute efficiency has improved extremely fast: each year, you need several times less training compute to reach the same capability.
But AI architectures/algorithms haven’t changed *that* much in recent years.
So where do these efficiency improvements come from? 🧵
One explanation is that these improvements came not from better algorithms, but better data.
For example, training has shifted from uncurated web data to heavily processed (and often synthetic) data. AI companies are also spending billions on data, like RL environments:
According to @Jsevillamol, @exponentialview’s Hannah Petrovic, and @ansonwhho, it depends. Gross margins were around 45%, making inference look profitable.
But after accounting for the cost of operations, OpenAI likely incurred a loss.🧵
Even the gross profits from running models weren’t enough to recoup R&D costs.
Gross profits running GPT-5 were less than OpenAI's R&D costs in the four months before launch. And the true R&D cost was likely higher than that.
The core problem: AI R&D is expensive, and model lifecycles are too short to get enough revenue.
So even if it’s profitable to run models, the full lifecycle is likely loss-making — as long as GPT-5 is representative of other models.
Global AI compute capacity now totals over 15 million H100-equivalents.
Our new AI Chip Sales data explorer tracks where this compute comes from across Nvidia, Google, Amazon, AMD, and Huawei, making it the most comprehensive public dataset available.
Nvidia’s B300 GPU now accounts for the majority of its revenue from AI chips, while H100s make up under 10%.
We estimate chip-level spending using earnings reports, company disclosures, and analyst and media coverage.
These chips present massive resource demands.
Even before the power overheads of servers and data centers, this many chips would draw over 10 GW of power - around twice the average power consumption of New York City.