Thread by @arena on Thread Reader App

Exciting News from Chatbot Arena!

@GoogleDeepMind's new Gemini 1.5 Pro (Experimental 0801) has been tested in Arena for the past week, gathering over 12K community votes.

For the first time, Google Gemini has claimed the #1 spot, surpassing GPT-4o/Claude-3.5 with an impressive score of 1300 (!), and also achieving #1 on our Vision Leaderboard.

Gemini 1.5 Pro (0801) excels in multi-lingual tasks and delivers robust performance in technical areas like Math, Hard Prompts, and Coding.

Huge congrats to @GoogleDeepMind on this remarkable milestone!

Gemini (0801) Category Rankings:
- Overall: #1
- Math: #1-3
- Instruction-Following: #1-2
- Coding: #3-5
- Hard Prompts (English): #2-5

Come try the model and let us know your feedback!
More analysis below👇

Gemini 1.5 Pro (Experimental 0801) #1 on Vision Leaderboard.

Gemini shows strong multilingual capability: #1 performance in Chinese, Japanese, German, Russian.

But in technical domains like Coding/Hard Prompt Arena, Claude 3.5 Sonnet, GPT-4o, Llama 405B are still leading the way.

Overall win-rate heatmap: Gemini 1.5 Pro (0801) wins 54% vs GPT-4o, 59% vs Claude-3.5-Sonnet.

Check out full data at and come chat with the model! leaderboard.lmsys.org

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll