Alvaro Cintas Profile picture
Jul 20, 2023 12 tweets 9 min read Read on X
I just compared ChatGPT, Bard, Claude 2 and Llama 2!

Here is how they did on:

- Critical thinking
- Simple math
- Programming
- Riddles
- Creative writing

The summary of the results are shown at the end of this THREAD 👇
Before we start, I want to address a couple of things:

- This is by no means a conclusive/thorough study. This was done for fun testing different small questions just to see how they would do.
- I didn’t add those questions that all of them got correct, which were a lot.
- Some… twitter.com/i/web/status/1…
1. Logic/Critical Thinking

Q: I put a diamond in a cup and then place the cup upside down on my bed. Later I came back, took the cup, and put it in the fridge. Where is the diamond?

ChatGPT ❌
Bard ❌
Claude 2 ✅
Llama 2 ❌


Image
Image
Image
Image
2. Logic/Critical Thinking

Q: How many months have 28 days?

ChatGPT ✅
Bard ❌
Claude 2 ✅
Llama 2 ✅


Image
Image
Image
Image
3. Math Question

Q: 100kg of potatoes are 99% water by weight. Why dry them until they are 98% water, can you guess their new weight?

ChatGPT ✅
Bard ✅
Claude 2 ✅
Llama 2 ❌


Image
Image
Image
Image
4. Math Question

Q: What is the sum of the first 10 prime numbers?

ChatGPT ✅
Bard ✅
Claude 2 ✅
Llama 2 ❌


Image
Image
Image
Image
5. Small Coding

Q: Write a Python code to find the first 2 missing numbers in a list.

*All of them got correctly finding 1 instead of 2*

ChatGPT ✅
Bard ✅
Claude 2 ❌
Llama 2 ❌


Image
Image
Image
Image
6. Riddles

All of them were really good at solving riddles. The only riddle I tried that one of them missed was this 👇

Q: David’s father has three sons: Snap, Crackle, and _____?

ChatGPT ✅
Bard ✅
Claude 2 ✅
Llama 2 ❌


Image
Image
Image
Image
7. Creative Thinking/Language

Q: Write a 5 line poem where all the sentences need to finish on the vowel “e”

ChatGPT ❌
Bard ✅
Claude 2 ❌
Llama 2 ~✅ (technically they end on “e”)


Image
Image
Image
Image
RESULTS

All of them did pretty good.

Please keep in mind that they answered correctly most of time. I just wrote here those questions that at least one of the models got incorrectly.

- ChatGPT: 5/7
- Bard: 5/7
- Claude 2: 5/7
- Llama 2: 2/7

Counting the other 17 questions… https://t.co/Z9XNfWfhTytwitter.com/i/web/status/1…
Image
If you enjoyed this and want to share it, like & retweet the first tweet :)

Also, you can subscribe for free to , where I share AI tutorials, news, and tools. https://t.co/bWVBMX17G8todaystechtalk.beehiiv.com
👉 Lastly, I wanted to test ChatGPT with GPT-4 (Plus Users) and it was able to get all of them CORRECT!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alvaro Cintas

Alvaro Cintas Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @dr_cintas

Aug 10
OpenAI has released a comprehensive prompting guide for GPT-5.

It emphasizes agentic task performance, coding optimization, and surgical instruction-following.

Here’s a breakdown and prompts for mimic writing, coding and agentic behavior: Image
Image
GPT-5 excels in two primary domains:

🤖Agentic Workflows
• Multi-step reasoning & tool calling
• Trained for long-context understanding

💻Coding Tasks
• Top coding capabilities
• Handles large codebases & multi-file refactors
• Excellent aesthetic taste for frontend
Control GPT-5’s agentic behavior:

Less Eagerness:
• Lower reasoning_effort (medium → low)
• Set context rules: “2 tool calls max”
• Add escape hatches: “even if not fully correct”

More Eagerness:
• Max reasoning_effort
• Add: “keep going until completely resolved” Image
Image
Read 11 tweets
Aug 8
What a crazy week in AI 🤯

- Google Genie 3
- Claude Opus 4.1
- ElevenLabs Music
- Grok Video Imagine
- Lindy AI Agent Builder
- ChatGPT OSS and GPT-5
- Alibaba’s New Qwen-Image
- Google’s Gemini Storybooks

Here’s EVERYTHING you need to know:
1. Google DeepMind announces Genie 3, generating interactive environments you can navigate in real time at 24fps.

DeepMind calls it a crucial stepping stone toward artificial general intelligence.
2. Anthropic releases Claude Opus 4.1, hitting 74.5% on SWE-bench Verified coding tests.

The upgrade improves research and data analysis skills with better detail tracking. Image
Read 10 tweets
Aug 5
🚨 BREAKING: ElevenLabs releases AI music model.

You can now create studio-quality tracks with vocals using just a text prompt.

It also allows you to choose song duration and the style and genre.

Examples & link to try free below.
1. Echoes of Midnight:

Dreamy, psychedelic, slow Indie Rock, reverb-soaked vocals, retro keys, catchy chorus, analog, phased guitars, liminal, nostalgic feeling, anthem.
2. Andrenalina:

Latin reggaeton with a powerful charismatic male vocalist MC, Spanish, influence from dancehall, high quality modern production, dembow, catchy, slow.
Read 7 tweets
Aug 5
Tasker AI Assistant just launched.

It's a team of specialized agents that handle your recurring tasks end-to-end. 

5 wild examples + how to try free 👇:

1. Transform YT video into a 6-slide deck
2. Email Newsletter Summarizer
3. Research AI influencers and add them to a spreadsheet
Read 7 tweets
Aug 1
What a crazy week in AI 🤯

- Ideogram Character
- Gemini’s Deep Think
- ChatGPT Study Mode
- Alibaba Wan 2.2 AI Video
- FLUX.1 Krea Photorealistic
- Hunyuan 3D World AI Model
- Microsoft Copilot Mode Browser
- Zai SOTA Open-Source Agentic AI

Here’s EVERYTHING your need to know:
1. Ideogram releases Character, the first character consistency model that works with just one reference image.

Now available free to all users, it renders infinite variations of your characters with striking fidelity.
2. Google releases Gemini 2.5 Deep Think, its first multi-agent model that spawns multiple AI agents to tackle questions in parallel.

Currently rolling out for Google AI Ultra subscribers. It outperforms most models in major benchmarks. Image
Read 10 tweets
Jul 30
You can now vibe code 3D websites and mobile apps with this AI.

It lets you build fully functional apps, hit games, and even interactive sites.

5 powerful examples + how to try 👇:

1. Robotic interactive website
2. News Aggregator App
3. Marketing website to sell services
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(