Post

Edsen 🇵🇱

Oct 13 • 10 tweets • 2 min read • Read on X

Some observations from my multi-day testing of different Gemini 3 checkpoints that were/are tested on aistudio:

— ecp is the current newest and at the same time most likely ckpt to get on a A/B test, it is really good on frontend tasks but seems to do mid on other tasks... 🧵

it can create wonderful pages zero-shot, do wonderful SVG generations, but seems like it can't generalize it's performance on other areas and is generally worse on logic/reasoning/complex coding tasks.

— k0/x5 are the current frontier models that are still present on the A/B...

meanwhile they're also currently the rarest pair of models to catch on the test. These models are doing absolutely amazing on most of complex tasks, ways ahead of what 2.5 Pro has been able to accomplish. The difference between both isn't that significant, with k0 being...

only a little better compared to x5, especially on complex tasks involving math/physics/backend, while x5 can be a little better on other set of areas. For example, k0 is able to compose classic music, create incredible voxel design or zero-shot complex backend applications...

k0 feels more scientifical compared to x5 and tends to always use LaTeX rendering. Both models however do weird small bugs on complex backend tasks, those are easy to fix but they make the same ones over and over.

— 5qa/2ht was an another promising pair...

that is already unfortunately retired from the test. Can't say too much about them, but I do believe that they were even better than the current frontier pair (k0/x5) on most of tasks, both backend and frontend coding. 5qa was often better. They were consistently...

zero-shotting complex applications without making any larger bugs. Long ways ahead of what is possible with the current frontier LLMs and even better than k0/x5 in my opinion.

— There were additional checkpoints, like d17/da9 or e31/f61. However those are retired as of now...

and I'm unable to say much about them as they were on aistudio for short period of time. Although according to other users feedback, they all were very similar in capabilities with each excelling only a little bit in certain areas.

What can be said for sure, the ckpts that I got to try were steps ahead of what was possible with any other AI to date. Google is definitely cooking right there, and I am sure that Gemini 3 won't be an disappointment. I am expecting the largest gains to be at vision and coding.

@threadreaderapp unroll

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Edsen 🇵🇱

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!