Franklyn Wang Profile picture
Nov 18 14 tweets 4 min read Read on X
Doubling o1-preview performance on ARC-AGI with one simple trick 🚀

tldr: by providing human-like representations to o1, we are able to substantially increase performance on @arcprize.Image
Image
AI is really smart; its scores on math contests are no joke. But ARC Prize should be much easier than math contests, and yet frontier models generally do not score very well.
This is mainly because frontier models cannot really “see” the grid; imagine a genius trying to do ARC Prize with their eyes closed -- seems unlikely to bear fruit!
ARC Prize problems are not strictly well-posed -- there are many patterns that take every input to output, but only one mapping is evident to people.
You might ask, how do we get the representations? Do we use tree diffusion or autoencoding? The answer is no! We use an extensive set of handwritten heuristics to capture patterns that are salient to people.
For example, here's a simple case -- the algorithm recognizes the shape being re-colored. Image
Image
We also show a more complicated example -- note our ability to handle augmentations! Image
Image
We can even handle occlusions! Image
Image
Because our method essentially represents each grid as an abstract syntax tree, we call our method Pattern Extraction and Abstraction for Cognitive Heuristics (PEACH) -- as peaches grow on trees, unlike other "reasoning" fruits.
Then, we simply ask frontier LLMs like o1 to code the mapping -- and find that a small handful of samples is enough for strong results, far fewer than the thousands used in prior art! Image
We find these results encouraging for solving this problem in a “human-like way”. We also emphasize that we do not fine tune any models, so there's plenty more juice!
All large efforts require a team. I’d like to thank @gopalkgoel1, @kattian_ , @minimario1729, Justin Zhang, @yunyu_l, @rahulgs, @cool_cocohearts, @fluorane, and @jacobtpl for all their helpful contributions, both conceptual and practical.
I'd also like to acknowledge Yunyu Lin (@yunyu_l) and David Petersen (@typesfaster) for financial support used to conduct this research.
But this is just the beginning! If interested, please DM me here to discuss potential collaborations.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Franklyn Wang

Franklyn Wang Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(