Edsen 🇵🇱 Profile picture
Oct 13 10 tweets 2 min read Read on X
Some observations from my multi-day testing of different Gemini 3 checkpoints that were/are tested on aistudio:

— ecp is the current newest and at the same time most likely ckpt to get on a A/B test, it is really good on frontend tasks but seems to do mid on other tasks... 🧵 Image
it can create wonderful pages zero-shot, do wonderful SVG generations, but seems like it can't generalize it's performance on other areas and is generally worse on logic/reasoning/complex coding tasks.

— k0/x5 are the current frontier models that are still present on the A/B...
meanwhile they're also currently the rarest pair of models to catch on the test. These models are doing absolutely amazing on most of complex tasks, ways ahead of what 2.5 Pro has been able to accomplish. The difference between both isn't that significant, with k0 being...
only a little better compared to x5, especially on complex tasks involving math/physics/backend, while x5 can be a little better on other set of areas. For example, k0 is able to compose classic music, create incredible voxel design or zero-shot complex backend applications...
k0 feels more scientifical compared to x5 and tends to always use LaTeX rendering. Both models however do weird small bugs on complex backend tasks, those are easy to fix but they make the same ones over and over.

— 5qa/2ht was an another promising pair...
that is already unfortunately retired from the test. Can't say too much about them, but I do believe that they were even better than the current frontier pair (k0/x5) on most of tasks, both backend and frontend coding. 5qa was often better. They were consistently...
zero-shotting complex applications without making any larger bugs. Long ways ahead of what is possible with the current frontier LLMs and even better than k0/x5 in my opinion.

— There were additional checkpoints, like d17/da9 or e31/f61. However those are retired as of now...
and I'm unable to say much about them as they were on aistudio for short period of time. Although according to other users feedback, they all were very similar in capabilities with each excelling only a little bit in certain areas.
What can be said for sure, the ckpts that I got to try were steps ahead of what was possible with any other AI to date. Google is definitely cooking right there, and I am sure that Gemini 3 won't be an disappointment. I am expecting the largest gains to be at vision and coding.
@threadreaderapp unroll

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Edsen 🇵🇱

Edsen 🇵🇱 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(