The summary of the results are shown at the end of this THREAD 👇
Before we start, I want to address a couple of things:
- This is by no means a conclusive/thorough study. This was done for fun testing different small questions just to see how they would do.
- I didn’t add those questions that all of them got correct, which were a lot.
- Some… twitter.com/i/web/status/1…
1. Logic/Critical Thinking
Q: I put a diamond in a cup and then place the cup upside down on my bed. Later I came back, took the cup, and put it in the fridge. Where is the diamond?
ChatGPT ❌
Bard ❌
Claude 2 ✅
Llama 2 ❌
2. Logic/Critical Thinking
Q: How many months have 28 days?
ChatGPT ✅
Bard ❌
Claude 2 ✅
Llama 2 ✅
3. Math Question
Q: 100kg of potatoes are 99% water by weight. Why dry them until they are 98% water, can you guess their new weight?
ChatGPT ✅
Bard ✅
Claude 2 ✅
Llama 2 ❌
4. Math Question
Q: What is the sum of the first 10 prime numbers?
ChatGPT ✅
Bard ✅
Claude 2 ✅
Llama 2 ❌
5. Small Coding
Q: Write a Python code to find the first 2 missing numbers in a list.
*All of them got correctly finding 1 instead of 2*
ChatGPT ✅
Bard ✅
Claude 2 ❌
Llama 2 ❌
6. Riddles
All of them were really good at solving riddles. The only riddle I tried that one of them missed was this 👇
Q: David’s father has three sons: Snap, Crackle, and _____?
ChatGPT ✅
Bard ✅
Claude 2 ✅
Llama 2 ❌
7. Creative Thinking/Language
Q: Write a 5 line poem where all the sentences need to finish on the vowel “e”
ChatGPT ❌
Bard ✅
Claude 2 ❌
Llama 2 ~✅ (technically they end on “e”)
RESULTS
All of them did pretty good.
Please keep in mind that they answered correctly most of time. I just wrote here those questions that at least one of the models got incorrectly.
- Google Genie 3
- Claude Opus 4.1
- ElevenLabs Music
- Grok Video Imagine
- Lindy AI Agent Builder
- ChatGPT OSS and GPT-5
- Alibaba’s New Qwen-Image
- Google’s Gemini Storybooks
Here’s EVERYTHING you need to know:
1. Google DeepMind announces Genie 3, generating interactive environments you can navigate in real time at 24fps.
DeepMind calls it a crucial stepping stone toward artificial general intelligence.
2. Anthropic releases Claude Opus 4.1, hitting 74.5% on SWE-bench Verified coding tests.
The upgrade improves research and data analysis skills with better detail tracking.
- Ideogram Character
- Gemini’s Deep Think
- ChatGPT Study Mode
- Alibaba Wan 2.2 AI Video
- FLUX.1 Krea Photorealistic
- Hunyuan 3D World AI Model
- Microsoft Copilot Mode Browser
- Zai SOTA Open-Source Agentic AI
Here’s EVERYTHING your need to know:
1. Ideogram releases Character, the first character consistency model that works with just one reference image.
Now available free to all users, it renders infinite variations of your characters with striking fidelity.
2. Google releases Gemini 2.5 Deep Think, its first multi-agent model that spawns multiple AI agents to tackle questions in parallel.
Currently rolling out for Google AI Ultra subscribers. It outperforms most models in major benchmarks.