Toby Ord Profile picture
Sep 14, 2020 6 tweets 2 min read Read on X
A group of astronomers have found phosphine in the atmosphere of Venus, which is hard to explain other than by the presence of life. This is not at all conclusive, but should prompt further investigation.

nature.com/articles/s4155…

How might it matter, if there was life? 1/6
The scientists don’t suggest intelligent life; we are probably talking about microbes. But this could still be a big deal. It would mean life either started independently there or was transported between bodies in our Solar System. Let’s focus on the former. 2/6
The possibility that it is extremely hard and rare for life to begin is currently the best explanation for why we don’t see signs of life elsewhere in the cosmos, despite the presence of so many stars in our galaxy and galaxies in the observable universe. 3/6
It is thus often seen as a downer. But many of the alternative explanations for the silence in the skies are worse. One prominent alternative is that technological civilisations inevitably destroy themselves. 4/6
If we did find independent life on other planets it would shift our credences away from the hypothesis that life is hard to start and towards the hypothesis that it is all too easy to end. This would be bad news for our prospects. 5/6
If you want all the details, I’ve written a paper on this with @anderssandberg and @KEricDrexler. 6/6
arxiv.org/abs/1806.02404

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Toby Ord

Toby Ord Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tobyordoxford

Mar 20
B R O A D T I M E L I N E S
We should have neither short AI timelines, nor long timelines, but a broad probability distribution over when transformative AI will arrive.
My new essay explains why & explores the implications of such deep uncertainty.
🧵 1/ Image
Here is the range of credible dates for AGI, across all forecasters at Metaculus.
This is a huge range of uncertainty. The median date is 2033, but their 80% confidence interval is from 2026 to 2067 — between 0.25 years and 41 years.
2/ Image
Here are three expert AI forecasts (from 2023) addressing “In what year would AI systems be able to replace 99% of current fully remote jobs?”
This is very broad uncertainty between experts and a closer look shows that even each individual expert has very wide uncertainty.
3/ Image
Read 22 tweets
Mar 11
METR found that half of the code written by AIs on a prominent benchmark which had been graded as correct would have actually been rejected by humans for inadequate quality.
What does this mean for their famous time-horizon metric?
🧵
While it hasn't received much attention yet, METR's article tries to estimate this in an appendix. Their classic time-horizon research is based on different benchmarks, but gives similar horizon lengths when applied to this SWE-Bench benchmark.
metr.org/notes/2026-03-…
When they redid this time-horizon analysis for the SWE-Bench benchmark counting the code that humans would reject as a failure, they found the estimated horizon length dropped by a factor of *7*.
Read 9 tweets
Feb 4
Some great new analysis by @gushamilton shows that AI agents *don't* obey a constant hazard rate / half-life. Instead they all have a declining hazard rate as the task goes on.
🧵
@gushamilton This means that their success rates on tasks beyond their 50%-horizon are better than the simple model suggests, but those for tasks shorter than the 50% horizon are worse than it suggests.
@gushamilton I had suggested a constant hazard rate was a good starting assumption for how their success rate at tasks decays with longer durations. It is the simplest model and fits the data OK.
Read 17 tweets
Dec 22, 2025
Are the *costs* of AI agents also rising exponentially?

We all know the graph from METR showing exponential growth in the length of tasks AI can perform. But the costs to perform these tasks are growing quickly too.
Indeed, it looks like they are growing even faster:
🧵 Image
In my view, the key question is:
How is the ‘hourly’ cost of AI agents changing over time?
If this hourly cost is increasing, then these cutting-edge AI systems would be getting less cost-competitive with humans over time.
If so, the METR trend could be misleading. Part of the progress would be from more lavish expenditure on compute so it would be diverging from what is economical.
It would be becoming more like the Formula 1 of AI performance — showing what is possible, but not what is practical.
Read 14 tweets
Oct 20, 2025
New post on RL scaling:
Careful analysis of OpenAI’s public benchmarks reveals RL scales far worse than inference: to match each 10x scale-up of inference compute, you need 100x the RL-training compute. The only reason it has been cost-effective is starting from a tiny base.
🧵 Image
But now RL has grown to nearly the size of pretraining and scale-ups beyond this reveal its inefficiency. I estimate it would take a 1,000,000x scale-up from its current level to add the equivalent to a 1,000x scale-up of inference or a 100x scale-up in pretraining.
This is a big deal. Pretraining scaling has already stalled and RL-scaling was the new hope for scaling up training compute. By going beyond imitation learning it also offered the best hope for blasting past the human-range of abilities — but it just scales too poorly.
Read 5 tweets
Oct 3, 2025
Evidence Recent AI Gains are Mostly from Inference-Scaling
🧵
Here's a thread about my latest post on AI scaling...
1/14 Image
Scaling up AI using next-token prediction was the most important trend in modern AI. It stalled out over the last couple of years and has been replaced by RL scaling.
This has two parts:
1. Scaling RL training
2. Scaling inference compute at deployment
2/
Many people focus on (1). This is the bull case for RL scaling — it started off small compared to internet-scale pre-training, so can be scaled 10x or 100x before doubling overall training compute.
3/
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(