.@OpenAI dropped a new research paper showing AI agents are now capable of replicating cutting-edge AI research papers from scratch.
This is one step closer to the Intelligence Explosion: AI that can discover new science and improve itself.
Here’s what they learned: 🧵
Introducing PaperBench.
A new framework designed to test this very capability!
It gives AI agents access to recent ML research papers (20 from ICML 2024) and asks them to reproduce the results.
How does it work?
Agents got the raw paper PDF, tools like web access & coding environments, and need to write code to replicate key findings – a task taking human experts days.
The agents had 12 hours and no prior knowledge of the paper.