Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval)
Gemini 3 Pro:
31.11%, $0.81/task
Gemini 3 Deep Think (Preview):
45.14%, $77.16/task
Also, notable - Gemini 3 results on ARC-AGI-1 (Semi-Private Eval)
Gemini 3 Pro:
75.00%, $0.49/task
Gemini 3 Deep Think (Preview):
87.50%, $44.26/task
Frontier AI reasoning systems are now closing the complexity scaling gap between ARC-AGI-1 and ARC-AGI-2
This is surprising, as these same systems also make obvious mistakes on easy tasks (for humans) from ARC-AGI-1. We're not sure why and invite help from the community to study this phenomenon