Thang Luong Profile picture
Lead Superhuman Reasoning team @GoogleDeepMind. AI IMO Gold. Co-led #DeepThink, #AlphaGeometry, #Bard (now Gemini) Multimodality. Co-founded #MeenaBot.
Nov 4 9 tweets 6 min read
Continuing our IMO-gold journey, I’m delighted to share our #EMNLP2025 paper “Towards Robust Mathematical Reasoning”, which tells some of the key stories behind the success of our advanced Gemini #DeepThink at this year IMO. Finding the right north-star metrics was highly critical for our IMO effort and we did it with #IMOBench, a suite of advanced reasoning benchmarks for foundation models. More importantly, we encourage the community to go beyond short answers and showed that automatic grading of long-form answers is promising! Read on to see our project page, paper, and datasets in the thread 🙂Image IMO-Bench consists of three benchmarks that judge models on diverse capabilities: IMO-AnswerBench, a large-scale test on getting the right answer; IMO-ProofBench, a next-level evaluation for proof writing; and IMO-GradingBench, a new benchmarkto enable further progress in automatic evaluation of long-form answers.

A key highlight of our work is to show that autograders built with Gemini reasoning correlate well with human evaluations on IMO-ProofBench, as illustrated below, for a wide range of foundation models. This was achieved thanks to the accompanying grading schemes in IMO-Bench, which are suitable for both human experts and automated systems. We ultimately hope to steer the community's focus from final answers to the proofs themselves, enabling a more rigorous assessment of AI reasoning processes.

Project: imobench.github.io
Paper: arxiv.org/abs/2511.01846
Datasets: github.com/google-deepmin…Image
Aug 1 4 tweets 2 min read
Our IMO journey continues: the yolo run model that we trained a week before #imo2025, despite all possible likelihood of failures, magically achieves SOTA across a wide range of reasoning tasks from maths, to coding, and challenging knowledge. I'm very excited that we have now delivered the IMO 🥇 system to the hands of mathematicians and a simplified version (results below) to all Google AI Ultra subscribers.Image One very important distinction compared to all other IMO results out there is that our model is production ready, not experimental 🙂 This advanced model together powered by Deep Think mode is quite creative in problem solving. Check out this video of what a mathematician thinks and see the blog post for more details blog.google/products/gemin…
Jan 17, 2024 9 tweets 5 min read
Super thrilled to share our latest work, AlphaGeometry from @GoogleDeepMind , the first AI system ever approaching the IMO gold medalists in solving Olympiad geometry math problems. Published today at Nature, titled “Solving olympiad geometry without human demonstrations”, our work marks an important milestone towards advanced reasoning, which, I believe, is the key prerequisite for AGI.

As someone who was doing Olympiad Maths full time back in high school, I find the results really fascinating. #AlphaGeometry (trained on 100% synthetic data) was able to surpass the previous state-of-the-art by a large margin. It can solve all geometry problems of the years 2000 and 2015, judged correct by humans, equivalent to winning real Bronze medals those years!

A few key ideas of AlphaGeometry. (1) synthetic data generation at scale with 100M theorems and proofs, allowing AlphaGeometry to learn from scratch, without any human demonstrations. (2) a neuro-symbolic architecture that combines a neural language model (System 1, creative) with a symbolic deduction engine (System 2, reliable), allowing AlphaGeometry to reason efficiently and effectively.

Amazing collaborators: @thtrieu_, @Yuhu_ai_, @quocleix, @hhexiy.
Blog:
Paper:
Code: dpmd.ai/alphageometry
nature.com/articles/s4158…
github.com/google-deepmin… This is the neuro-symbolic architecture of #AlphaGeometry. Similar to System1 and System 2, in the book "Thinking, fast and slow", the symbolic engine will first take a crack at the problem mechanically; if it gets stuck it will ask the neural language model for suggestions of new points and lines (the auxiliary constructions that everyone used to do in high school!)Image