Latest Twitter Threads by @nityndg on Thread Reader App

Jun 30 • 20 tweets • 6 min read

Can AI agents help researchers reproduce research more quickly? We conducted an uplift study. The answer is yes: researchers reproduced papers > 2x faster using Codex with GPT-5.4 xhigh. In a new paper, we show many other results.

When a benchmark’s accuracy saturates, the field usually replaces it with a harder one. We use CORE-Bench Hard, a benchmark for computational reproducibility, as a case study to show what we can still measure after accuracy saturates.

Paper: arxiv.org/pdf/2606.26158…

Share this page!

Enter URL or ID to Unroll