Kristoph Profile picture
e/acc. a passionate advocate of ai. a builder of many things. currently VP at WBD. formerly head of VOD at AWS, CTO / founder of many startups.
Oct 6 4 tweets 2 min read
There is much excitement about this prompt with claims that it helps Claude 3.5 Sonnet outperform o1 in reasoning.

I benchmarked this prompt to find out if the this claim is true ( thanks for @ai_for_success for the heads on this last night ) 🧵 Image The TLDR is that this prompt does not improve Claude 3.5 Sonnet to o1 levels in reasoning but it does tangibly improve its performance in reasoning focused benchmarks.

However, this does come at the expense of 'knowledge' focused benchmarks where the model is more directly generating text it has been trained on.Image