Forecasting Research Institute's Threads

Jan 12 • 8 tweets • 4 min read

What do experts and superforecasters think about the future of AI research and development?

In Wave 4 of the Longitudinal Expert AI Panel (LEAP), we asked top AI experts to forecast progress in AI R&D, hiring, company valuations, data center buildout, and more.

Here’s what you need to know 🧵

📈 AI benchmark progress is advancing faster than experts expect

AI performance on hard coding tasks is a useful indicator of potential capability increases in self-improving AI R&D: when models take an active role in improving AI itself.

The median forecaster significantly underestimated progress on LiveCodeBench Pro—a benchmark that tracks performance on tough programming tasks.

The median expert in our sample predicted state-of-the-art performance on LiveCodeBench Pro (Hard) of 14% in 2026 and 33% by the end of 2030. Since the survey closed, GPT-5.2 has already hit 33% on the benchmark.

A quarter of experts and superforecasters expect major progress on this coding benchmark, providing 50th percentile forecasts of at least 60% accuracy by 2030.

We plan to identify which forecasters were most accurate on questions like this to see what they believe about other topics.

Jan 8 • 6 tweets • 3 min read

🏆 In October, we invited external teams to submit to ForecastBench, our AI forecasting benchmark.

The challenge? Beat superforecasters—using any tools available (scaffolding, ensembling, etc).

The result? External submissions are now the most accurate models on our leaderboard—though superforecasters still hold #1.

@xai's model (grok-4-fast) is the leading external submission, at #2.

One of Cassi's entries takes the #3 spot

Here's what changed. 🧵

In October, we opened up ForecastBench’s tournament leaderboard to external submissions. Teams are free to use any tools they choose.

Several teams responded, including @xai, Cassi, @fractalai, @lightningrodai, and @_Mantic_AI. Thanks to all of them for participating on this challenging benchmark.

Models from @xai and Cassi outperformed all our baseline LLM configurations.

Nov 10, 2025 • 14 tweets • 8 min read

Today, we are launching the most rigorous ongoing source of expert forecasts on the future of AI: the Longitudinal Expert AI Panel (LEAP).

We’ve assembled a panel of 339 top experts across computer science, AI industry, economics, and AI policy.

Roughly every month—for the next three years—they’ll provide precise, falsifiable forecasts on the trajectory of AI capabilities, adoption, and impact.

Our results cover where experts predict major effects of AI, where they expect less progress than AI industry leaders, and where they disagree.

LEAP experts forecast major effects of AI by 2030, including:

⚡ 7x increase in AI’s share of U.S. electricity use (1% -> 7%)
🖥️ 9x increase in AI-assisted work hours (2% -> 18%)

By 2040, experts predict:
👥30% of adults will use AI for companionship daily
🏆60% chance that AI will solve or substantially assist in solving a Millennium Prize Problem
🚂32% chance that AI will have been at least as impactful as a "technology of the millennium," like the printing press or the Industrial Revolution.

🧵Read on for more insights and results

Our LEAP panel is made up of the following experts:

🧑‍🔬 76 Top computer scientists (e.g., professors from top-20 universities)
🤖 76 AI industry experts (from frontier model and other leading AI companies)
💲 68 Leading economists (including many studying economic growth or technology at top universities)
🧠 119 Policy and think tank experts
🏆 12 Honorees from TIME’s 100 most influential people in AI, in 2023 and 2024

(Plus 60 highly accurate superforecasters and 1,400 members of the U.S. public)

For more details on our sample, see the full reports linked below.

Oct 8, 2025 • 10 tweets • 4 min read

Is AI on track to match top human forecasters at predicting the future?

Today, FRI is releasing an update to ForecastBench—our benchmark that tracks how accurate LLMs are at forecasting real-world events.

A trend extrapolation of our results suggests LLMs will reach superforecaster-level forecasting performance around a year from now.

Here’s what you need to know: 🧵

Why LLM forecasting accuracy is a useful benchmark:

🧠Forecasting requires collecting and synthesizing data, causal reasoning, and probabilistic thinking, making it a good test of reasoning

💼Forecasting has high practical value

🔮Future events aren’t in training data, making the benchmark hard to game

@elonmusk: “The ability to predict the future is the best measure of intelligence”

x.com/elonmusk/statu…

Sep 2, 2025 • 10 tweets • 4 min read

We now have the first accuracy results from the largest-ever existential risk forecasting tournament.

In 2022, we convened 80 experts and 89 superforecasters for the Existential Risk Persuasion Tournament (XPT), which collected thousands of forecasts in 172 questions across short-, medium- and long-term time horizons.

We now have answers for 38 short-run questions covering AI progress, climate technology, bioweapons, nuclear weapons and more.

Here’s what we found out: 🧵

Respondents—especially superforecasters—underestimated AI progress.

Participants predicted the state-of-the-art accuracy of ML models on the MATH, MMLU, and QuaLITY benchmarks by June 2025.

Domain experts assigned probabilities of 21.4%, 25%, and 43.5% to the achieved outcomes.

Superforecasters assigned even lower probabilities: just 9.3%, 7.2%, and 20.1% respectively.

Oct 1, 2024 • 18 tweets • 4 min read

Today, we're excited to announce ForecastBench: a new benchmark for evaluating AI and human forecasting capabilities. Our research indicates that AI remains worse at forecasting than expert forecasters. 🧵

Arxiv:
Website: arxiv.org/abs/2409.19839
forecastbench.org Evaluating LLM forecasting ability is tricky! Prior work asks models about events that already have (or have not) occurred, risking contamination of training data.
Our solution is to use questions about future events, the outcomes of which are unknowable when forecasts are made.

Share this page!

Enter URL or ID to Unroll