Koyena Pal Profile picture
Ph.D. Student @KhouryCollege | Data Scientist Intern @Fidelity | Interpretable AI + Data Science with @davidbau and Renée Miller | BS/MS @BrownCSDept
Jan 22 7 tweets 3 min read
Can models understand each other's reasoning? 🤔

When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?

Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇

(1/7) Image Why does this matter?

Faithfulness research (such as @AnthropicAI's "Reasoning Models Don't Always Say What They Think" and @Zidi's work) shows CoT doesn't always reflect internal reasoning. Models may hide hints and demonstrate selective faithfulness to intermediate steps.

What if explanations serve a different purpose?

If Model B follows Model A's reasoning to the same conclusion, maybe these explanations capture something generalizable (even if they are not perfectly "faithful" to either model's internals).

🧵(2/7)