Post

Elliot Glazer

@ElliotGlazer

Dec 21 • 9 tweets • 2 min read • Read on X

1/9 We’re announcing the development of Tier 4, a new suite of math problems that go beyond the hardest problems in FrontierMath. o3’s performance is remarkable, but there’s still a ways to go before any single AI system nears the collective genius of the math community.

2/9 For context, FrontierMath currently spans three broad tiers:
• T1 (25%) Advanced, near top-tier undergrad/IMO
• T2 (50%) Needs serious grad-level background
• T3 (25%) Research problems demanding relevant research experience
All can take hours—or days—for experts to solve.

3/9 Although o3 solved problems in all three tiers, it likely still struggles on the most formidable Tier 3 tasks—those “exceptionally hard” challenges that Tao and Gowers say can stump even top mathematicians.

4/9 Tier 4 aims to push the boundary even further. We want to assemble problems so challenging that solving them would demonstrate capabilities on par with an entire top mathematics department.

5/9 Each problem will be composed by a team of 1-3 mathematicians specialized in the same field over a 6-week period, with weekly opportunities to discuss ideas with teams in related fields. We seek broad coverage of mathematics and want all major subfields represented in Tier 4.

6/9 Process for a Tier 4 problem:

1. 1 week crafting a robust problem concept, which “converts” research insights into a closed-answer problem.

2. 3 weeks of collaborative research.
Presentations among related teams for feedback.

3. Two weeks for the final submission.

7/9 We’re seeking mathematicians who can craft these next-level challenges. If you have research-grade ideas that transcend T3 difficulty, please email elliot@epoch.ai with your CV and a brief note on your interests.

8/9 We’ll also hire some red-teamers, tasked with finding clever ways a model can circumvent a problem’s intended difficulty, and some reviewers to check for mathematical correctness of final submissions. Contact me if you think you’re suitable for either such role.

9/9 As AI keeps improving, we need benchmarks that reflect genuine mathematical depth. Tier 4 is our next (and possibly final) step in that direction.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Elliot Glazer

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!