Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Yuntian Deng

@yuntiandeng

Sep 17 • 6 tweets • 3 min read • Read on X

Is OpenAI's o1 a good calculator? We tested it on up to 20x20 multiplication—o1 solves up to 9x9 multiplication with decent accuracy, while gpt-4o struggles beyond 4x4. For context, this task is solvable by a small LM using implicit CoT with stepwise internalization. 1/4

Interestingly, the number of private reasoning tokens grows sublinearly with problem size, but is beyond what human-written CoT requires. For example, for 20x20, o1 uses ~3600 reasoning tokens, but human CoT needs ~400 for partial products and ~400 for sums, totaling ~800. 2/4

o1-preview has similar accuracy to o1-mini despite being more expensive and slower. Both still perform much better than gpt-4o (o1-preview was tested with a small sample size of 7 per cell due to inference speed and cost). 3/4

https://twitter.com/yuntiandeng/status/1814319104448467137

Lastly, this task is solvable even by a small language model: Implicit CoT with Stepwise Internalization can solve up to 20x20 multiplication with 99.5% accuracy, using a gpt-2 small architecture (117M parameters). 4/4

https://twitter.com/yuntiandeng/status/1814319104448467137

o1-mini mostly directly produces the answer, while gpt-4o and o1-preview mostly use CoT. Since mini has similar acc to preview, maybe private reasoning tokens are all it needs?

Also, adding "think step by step" to the prompt didn't seem to help (tested on a tiny sample size).

For those interested, an example prompt used was:

"Calculate the product of 15580146 and 550624703. Please provide the final answer in the format: Final Answer: [result]"

Try this out in our o1-mini chatbot: huggingface.co/spaces/yuntian…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @yuntiandeng

Yuntian Deng

@yuntiandeng

May 29

Can we teach LMs to internalize chain-of-thought (CoT) reasoning steps? We found a simple method: start with an LM trained with CoT, gradually remove CoT steps and finetune, forcing the LM to internalize reasoning.

Paper:
Done w/ @YejinChoinka @pmphlt 1/5 bit.ly/internalize_st…

Approach: Training has multiple stages.
-Stage 0: the model is trained to predict the full CoT and the answer.
-Stage 1: the first CoT token is removed, and the model is finetuned to predict the remaining CoT and the answer.
-This continues until all CoT tokens are removed. 2/5

Results: We finetuned a GPT-2 Small to solve 9-by-9 multiplication with 99% accuracy. This simple method can be applied to any task involving CoT. For example, we finetuned Mistral 7B to achieve 51% accuracy on GSM8K without producing any intermediate steps. 3/5

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Yuntian Deng

Try unrolling a thread yourself!

More from @yuntiandeng

Yuntian Deng

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!