A group of astronomers have found phosphine in the atmosphere of Venus, which is hard to explain other than by the presence of life. This is not at all conclusive, but should prompt further investigation.
The scientists don’t suggest intelligent life; we are probably talking about microbes. But this could still be a big deal. It would mean life either started independently there or was transported between bodies in our Solar System. Let’s focus on the former. 2/6
The possibility that it is extremely hard and rare for life to begin is currently the best explanation for why we don’t see signs of life elsewhere in the cosmos, despite the presence of so many stars in our galaxy and galaxies in the observable universe. 3/6
It is thus often seen as a downer. But many of the alternative explanations for the silence in the skies are worse. One prominent alternative is that technological civilisations inevitably destroy themselves. 4/6
If we did find independent life on other planets it would shift our credences away from the hypothesis that life is hard to start and towards the hypothesis that it is all too easy to end. This would be bad news for our prospects. 5/6
Is there a half-life for the success rates of AI agents?
I show that the success rates of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task.
🧵
1/
METR recently released an intriguing report showing that on a suite of tasks related to doing AI research, the length of tasks that frontier AI agents can complete has been doubling every 7 months. 2/
They measure task-length by how long, on average, it takes a human to complete it. And they measure the length of task an AI agent can complete by the longest task at which it still has ≥50% success rate.
3/
New results for o3 and o4-mini have been added to the @arcprize leaderboard. Here are some key takeaways: 1/ 🧵
@arcprize 1. The released version of o3 is much less capable than the preview version that was released to a lot of fanfare 4 months ago, though also much cheaper. People who buy access to it are not getting the general reasoning performance @OpenAI was boasting about in December.
2/
@arcprize @OpenAI 2. The @arcprize team tried to test a high-compute version of o3, but it kept failing to answer. They spent >$50,000 trying to get it to work, but couldn't, so those December results can't realistically be replicated with the released model.
When I posted this thread about how o3's extreme costs make it less impressive than it first appears, many people told me that this wasn't an issue as the price would quickly come down.
I checked in on it today, and the price has gone *up* by 10x. 1/n
Here is the revised ARC-AGI plot. They've increased their cost-estimate of the original o3 low from $20 per task to $200 per task. Presumably o3 high has gone from $3,000 to $30,000 per task, which is why it breaks their $10,000 per task limit and is no longer included. 2/n
The ARC-AGI team found that the o3 price estimate was only a tenth of what OpenAI were charging for the inferior o1-pro model, so updated the price estimate to use the price of o1-pro until the actual o3 is released and its true price is known. 3/n
It is stunning to see how Meta illegally downloaded billions of pages of copyrighted books and articles from Russian pirate sites when training Llama 3. And not only that, but Meta also directly redistributed that copyrighted data to others:
They knew this was illegal and were worried about being arrested if they did it:
And this was directly approved by their CEO Mark Zuckerberg:
New paper:
Inference Scaling Reshapes AI Governance
The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance.
🧵
1/
The nature of these effects depends crucially on whether this new inference compute will be used during external deployment or as part of a more complex training programme within the lab.
2/
Rapid scaling of inference-at-deployment would:
• lower the importance of open-weight models (and of securing the weights of closed models)
• reduce the impact of the first human-level models
• change the business model for frontier AI
…
3/
Inference Scaling and the Log-x Chart:
2024 saw a switch in focus from scaling up the compute used to train frontier AI models to scaling up the compute used to run them.
How well is this inference scaling going? 1/
You could think of it as a change in strategy from improving the quality of your employees’ work via giving them more years of training in which acquire skills, concepts and intuitions to improving their quality by giving them more time to complete each task.
2/
Or, using an analogy to human cognition, you could see more training as improving the model’s intuitive ‘System 1’ thinking and more inference as improving its methodical ‘System 2’ thinking.
3/