🚨Paper 🚨
What if LLMs could tell you they’re going to fail before they finish reasoning?
We trained models to predict their own future: whether they’ll succeed and how long it will take. At every token, in real time, with no extra compute.
We used this to develop an adaptive sampling algorithm for test-time compute. 👇🧵
ZIP repurposes unused logits to predict auxiliary quantities during next-token prediction. No architecture changes, no extra forward passes.
For ZIP-RC, we use 56 logits to parameterize an 8×7 grid: a joint distribution over expected reward and remaining generation length.