Latest Twitter Threads by @MartinJosifoski on Thread Reader App

Apr 15 • 7 tweets • 3 min read

Excited to share AIRA₂ — our next-generation AI Research Agents for ML that address key bottlenecks to scaling.

AIRA₂ achieves SoTA on real-world ML tasks from MLE-bench-30 (81.5% vs 72.7%), exceeds human SoTA on 6/20 diverse AI research tasks from AIRS-Bench (and hacks another 5), while exhibiting strong, predictable scaling properties.

To push the frontier of AI Research, we need systems that scale well. Developing AIRA₂, we learned a lot about the bottlenecks and what it takes to resolve them — insights already driving our next iteration:

1/

First, sample throughput heavily constrains the agent. We develop infra that moves AIRA₂ from sequential execution to asynchronous parallel exploration, enabling throughput to scale linearly with GPU resources.

2/

Share this page!

Enter URL or ID to Unroll