Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Epoch AI

Jul 25 • 7 tweets • 2 min read • Read on X

Should you start your training run early, so you can train for longer, or wait for the next generation of chips and algorithms? Our latest estimate suggests that it’s not effective to train for more than ~9 months. On current trends, frontier labs will hit that limit by 2027. 🧵

Why 9 months? Model developers face a tradeoff: wait before starting a run to take advantage of better hardware and algorithms, or start sooner with what’s available. Waiting lets you train faster once you start, so there’s an optimal run length for any given deadline.

Our previous work estimated that hardware + algorithmic progress would lead to a 15 month maximum training run. That work assumed algorithms were improving at 1.7x per year, but we now believe they are improving at a much faster 3x per year!

This sharper trade-off pulls the max from 15 months down to only 9.

Today’s longest training runs are already several months long, and frontier LLM training has been increasing at a rate of 1.4x per year. On current trends, they’ll reach 9 months by 2027.

What happens if training runs stop getting longer? Since 2018, longer runs explain about 1/3rd of the increase in total compute, so a natural guess is that training compute scaling would slow.

To keep up with today’s ~5x/year growth in training compute, labs will need to significantly accelerate hardware expansion.

Meeting this challenge may require the construction of larger clusters, or distributing runs across more data centers.

You can read more about our analysis on our website: epoch.ai/data-insights/…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @EpochAIResearch

Epoch AI

@EpochAIResearch

Jul 17

How fast has society been adopting AI?

Back in 2022, ChatGPT arguably became the fastest-growing consumer app ever, hitting 100M users in just 2 months. But the field of AI has transformed since then, and it’s time to take a new look at the numbers. 🧵

Historically, technology adoption took decades. For example, telephones took 60 years to reach 70% of US households. But tech diffuses faster and faster over time, and we should expect AI to continue this trend.

But even if we account for this trend, AI adoption seems incredibly fast. ~10% of the US used ChatGPT weekly within just 2 years, and ~30% in under 2.5 years.

Read 12 tweets

Epoch AI

@EpochAIResearch

Jul 17

We have graded the results of @OpenAI's evaluation on FrontierMath Tier 1–3 questions, and found a 27% (± 3%) performance. ChatGPT agent is a new model fine-tuned for agentic tasks, equipped with text/GUI browser tools and native terminal access. 🧵

This evaluation is not directly comparable to those on Epoch AI’s benchmarking hub, as it uses a different scaffold. First, we did not run the model ourselves—we only graded the outputs provided by OpenAI and don’t have access to their code to run the model. Second, ChatGPT agent has access to tools not available to other models we've assessed—most notably browser tools, which may have helped on questions related to recent research papers. Finally, the evaluation allowed up to 128K tokens per question, compared to our standard 100K; this difference is unlikely to have significantly affected results.

@OpenAI OpenAI has exclusive access to all FrontierMath problem statements and 237 of the 290 Tier 1–3 solutions. Epoch AI holds out the remaining solutions. We found no statistically significant performance difference between the held-out and non-held-out sets.

Read 6 tweets

Epoch AI

@EpochAIResearch

Jul 9

The IMO is next week. What will it tell us about AI?

@GregHBurnham argues that an AI gold medal could be a non-event or could be an important breakthrough—it depends on whether the AI system exhibits creative problem-solving. How to tell the difference? Read on!

@GregHBurnham It will be tempting to focus on whether an AI system gets a gold medal. Formal proof systems like Google’s AlphaProof are quite close to this, and even general-purpose LLMs have a fighting chance. But that's not the outcome to pay the most attention to.

@GregHBurnham Rather, the big thing to watch for is qualitative: can AI systems solve problems that require a lot of creativity?

Read 8 tweets

Epoch AI

@EpochAIResearch

Jul 3

What would a Manhattan Project for AI look like?

@ansonwhho and @ardenaberg argue that if one reaches the scale of previous national projects, an AI Manhattan project could result in a ~1000x compute scaleup by 2027.

@ansonwhho @ardenaberg A national AI project has become more and more of a possibility in the last year, with one as the top recommendation from a US-China congressional commission.

@ansonwhho @ardenaberg Previous national projects at their peaks spent an equivalent fraction of GDP as $120B-$250B today. The authors find that such a budget could centralize most NVIDIA compute in the US.

Read 8 tweets

Epoch AI

@EpochAIResearch

Jul 2

The state of large-scale AI models, July 2025:

- The number of large-scale model releases is growing rapidly (418 models over 10^23 FLOP)
- The UK has fallen behind, China has caught up (9 vs 151 models)
- There are far more of the largest models (33 models over 10^25 FLOP)

First, the number of large-scale model releases is growing rapidly.

In 2020, there were 4 models trained with more than 10^23 FLOP.
By the end of 2024, there were 327 such models in our dataset.

Most large-scale models — those trained on over 10^23 FLOP — are language models.

Of the 418 large-scale models in our data, 326 are language models, of which 86 are vision-language (like GPT-4).

Read 11 tweets

Epoch AI

@EpochAIResearch

Jun 27

LLM context windows have grown, but can models really use all this content?

We find signs of recent, rapid progress in their ability to do so. Read on to learn more!

From Claude 2.0’s 100k tokens in 2023 to Llama 4 Maverick’s 10M earlier this year, there’s no doubt that context windows are getting longer. On a set of models from Artificial Analysis, we find that the longest available context windows have grown at about 30x/year.

But, how effectively can models use these longer windows? We measured the input lengths at which models score above 80% on two moderately-challenging long-context benchmarks, Fiction.liveBench and MRCR (2-needle).

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Epoch AI

Try unrolling a thread yourself!

More from @EpochAIResearch

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Epoch AI

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!