How to get URL link on X (Twitter) App
🧠 Inspired by Boden’s creativity framework (1998), OMEGA tests:
It's impressive that the model can solve, e.g., 15-digit × 5-digit or 17 × 4 with 100% accuracy. I expected this improvement since the model can now backtrack and correct its reasoning, but it still seems insufficient.
We find that GPT3, ChatGPT, and GPT4 cannot fully solve compositional tasks even with in-context learning, fine-tuning, or using scratchpads. To understand when models succeed, and the nature of the failures, we represent a model’s reasoning through computation graphs.