Elliot Glazer Profile picture
Dec 21 9 tweets 2 min read Read on X
1/9 We’re announcing the development of Tier 4, a new suite of math problems that go beyond the hardest problems in FrontierMath. o3’s performance is remarkable, but there’s still a ways to go before any single AI system nears the collective genius of the math community.
2/9 For context, FrontierMath currently spans three broad tiers:
• T1 (25%) Advanced, near top-tier undergrad/IMO
• T2 (50%) Needs serious grad-level background
• T3 (25%) Research problems demanding relevant research experience
All can take hours—or days—for experts to solve.
3/9 Although o3 solved problems in all three tiers, it likely still struggles on the most formidable Tier 3 tasks—those “exceptionally hard” challenges that Tao and Gowers say can stump even top mathematicians.
4/9 Tier 4 aims to push the boundary even further. We want to assemble problems so challenging that solving them would demonstrate capabilities on par with an entire top mathematics department.
5/9 Each problem will be composed by a team of 1-3 mathematicians specialized in the same field over a 6-week period, with weekly opportunities to discuss ideas with teams in related fields. We seek broad coverage of mathematics and want all major subfields represented in Tier 4.
6/9 Process for a Tier 4 problem:

1. 1 week crafting a robust problem concept, which “converts” research insights into a closed-answer problem.

2. 3 weeks of collaborative research.
Presentations among related teams for feedback.

3. Two weeks for the final submission.
7/9 We’re seeking mathematicians who can craft these next-level challenges. If you have research-grade ideas that transcend T3 difficulty, please email elliot@epoch.ai with your CV and a brief note on your interests.
8/9 We’ll also hire some red-teamers, tasked with finding clever ways a model can circumvent a problem’s intended difficulty, and some reviewers to check for mathematical correctness of final submissions. Contact me if you think you’re suitable for either such role.
9/9 As AI keeps improving, we need benchmarks that reflect genuine mathematical depth. Tier 4 is our next (and possibly final) step in that direction.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Elliot Glazer

Elliot Glazer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(