How to get URL link on X (Twitter) App
IMO-Bench consists of three benchmarks that judge models on diverse capabilities: IMO-AnswerBench, a large-scale test on getting the right answer; IMO-ProofBench, a next-level evaluation for proof writing; and IMO-GradingBench, a new benchmarkto enable further progress in automatic evaluation of long-form answers.
One very important distinction compared to all other IMO results out there is that our model is production ready, not experimental 🙂 This advanced model together powered by Deep Think mode is quite creative in problem solving. Check out this video of what a mathematician thinks and see the blog post for more details blog.google/products/gemin…