Latest Twitter Threads by @ikristoph on Thread Reader App

Oct 6, 2024 • 4 tweets • 2 min read

There is much excitement about this prompt with claims that it helps Claude 3.5 Sonnet outperform o1 in reasoning.

I benchmarked this prompt to find out if the this claim is true ( thanks for @ai_for_success for the heads on this last night ) 🧵

https://twitter.com/_philschmid/status/1842846050320544016

The TLDR is that this prompt does not improve Claude 3.5 Sonnet to o1 levels in reasoning but it does tangibly improve its performance in reasoning focused benchmarks.

However, this does come at the expense of 'knowledge' focused benchmarks where the model is more directly generating text it has been trained on.

Share this page!

Enter URL or ID to Unroll