Latest Twitter Threads by @yuntiandeng on Thread Reader App

Sep 17, 2024 • 6 tweets • 3 min read

Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4

Interestingly, the number of private reasoning tokens grows sublinearly with problem size, but is beyond what human-written CoT requires. For example, for 20x20, o1 uses ~3600 reasoning tokens, but human CoT needs ~400 for partial products and ~400 for sums, totaling ~800. 2/4

May 29, 2024 • 5 tweets • 2 min read

Can we teach LMs to internalize chain-of-thought (CoT) reasoning steps? We found a simple method: start with an LM trained with CoT, gradually remove CoT steps and finetune, forcing the LM to internalize reasoning.

Paper:
Done w/ @YejinChoinka @pmphlt 1/5 bit.ly/internalize_st…

Approach: Training has multiple stages.
-Stage 0: the model is trained to predict the full CoT and the answer.
-Stage 1: the first CoT token is removed, and the model is finetuned to predict the remaining CoT and the answer.
-This continues until all CoT tokens are removed. 2/5

Share this page!

Enter URL or ID to Unroll