working towards continually self-improving AI, reasoning and alignment @ IBM Research AI, prev @CIS_Penn, @Wharton, founded @pennmlr
Apr 27 • 9 tweets • 4 min read
What if your language model could reason efficiently in an entirely new language?
We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.
As we move toward harder tasks, we have sought to imbue LLMs with the ability to generate long CoTs — however, verbalized reasoning chains can often be needlessly expensive, as well as unfaithful to the underlying reasoning process. It is valuable to explore alternative reasoning pathways, which is where latent reasoning comes into play.
But rather than mixing text and latent CoTs, or reasoning purely through embedding space, what if we could enable models to produce a much shorter chain, to balance performance and cost?