We are building a world class AI R&D company in Tokyo. We want to develop AI solutions for Japan’s needs, and democratize AI in Japan. https://t.co/1q07mb3TzE
Jul 1, 2025 • 4 tweets • 4 min read
We’re excited to introduce AB-MCTS!
Our new inference-time scaling algorithm enables collective intelligence for AI by allowing multiple frontier models (like Gemini 2.5 Pro, o4-mini, DeepSeek-R1-0528) to cooperate.
Inspired by the power of human collective intelligence, where the greatest achievements arise from the collaboration of diverse minds, we believe the same principle applies to AI. Individual frontier models like ChatGPT, Gemini, and DeepSeek are remarkably advanced, each possessing unique strengths and biases stemming from their training, which we view as valuable resources for collective problem-solving.
AB-MCTS (Adaptive Branching Monte Carlo Tree Search) harnesses these individualities, allowing multiple models to cooperate and engage in effective trial-and-error, solving challenging problems for any single AI. Our initial results on the ARC-AGI-2 benchmark are promising, with AB-MCTS combining o4-mini + Gemini-2.5-Pro + R1-0528, current frontier AI models, significantly outperforming individual models by a substantial margin.
This research builds on our 2024 work on evolutionary model merging, shifting focus from “mixing to create” to “mixing to use” existing, powerful AIs. At Sakana AI, we remain committed to pioneering novel AI systems by applying nature-inspired principles such as evolution and collective intelligence. We believe this work represents a step toward a future where AI systems collaboratively tackle complex challenges, much like a team of human experts, unlocking new problem-solving capabilities and moving beyond single-model limitations.
Algorithm (TreeQuest): github.com/SakanaAI/treeq…
ARC-AGI Experiments: github.com/SakanaAI/ab-mc…
The AB-MCTS combination of o4-mini + Gemini-2.5-Pro + R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin.