Poetiq Profile picture
Dec 23 7 tweets 2 min read Read on X
We finally had a moment to run our system with GPT-5.2 X-High on ARC-AGI-2!

Using the same Poetiq harness as before, we saw results as high as 75% at under $8 / problem using GPT-5.2 X-High on the full PUBLIC-EVAL dataset. This beats the previous SOTA by ~15 percentage points. Image
There was absolutely no training or model-specific optimization done at Poetiq for GPT-5.2.
This is a remarkable improvement in a very short time over earlier models we tested on the PUBLIC-EVAL set both in terms of accuracy and price.
If the same pattern holds as before between PUBLIC-EVAL and ARC Prize’s official testing on SEMI-PRIVATE, GPT-5.2 X-High with Poetic is positioned to yield a significant improvement over any existing configuration we’ve tested. Our fingers are crossed.
We are grateful to Open-AI for providing us access to their model for testing so that we could conduct experiments with GPT-5.2.
See our system description here:
poetiq.ai/posts/arcagi_a…
Stay tuned! We'll post our updated code to support GPT-5.2 after the holidays.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Poetiq

Poetiq Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @poetiq_ai

Nov 20
Is more intelligence always more expensive? Not necessarily.

Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1. Image
Read the full analysis and get our code: . poetiq.ai/posts/arcagi_a…Image
How? Using iterative problem solving and self auditing. Our meta-system autonomously decides on a strategy, determines whether it needs to refine its solution, or if it is ready to submit. This self-improving process incrementally constructs the answer.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(