Latest Twitter Threads by @kpal_koyena on Thread Reader App

Jan 22 • 7 tweets • 3 min read

Can models understand each other's reasoning? 🤔

When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?

Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇

(1/7)

Why does this matter?

Faithfulness research (such as @AnthropicAI's "Reasoning Models Don't Always Say What They Think" and @Zidi's work) shows CoT doesn't always reflect internal reasoning. Models may hide hints and demonstrate selective faithfulness to intermediate steps.

What if explanations serve a different purpose?

If Model B follows Model A's reasoning to the same conclusion, maybe these explanations capture something generalizable (even if they are not perfectly "faithful" to either model's internals).

🧵(2/7)

Share this page!

Enter URL or ID to Unroll