Ph.D. Student @KhouryCollege | Data Scientist Intern @Fidelity | Interpretable AI + Data Science with @davidbau and Renée Miller | BS/MS @BrownCSDept
Jan 22 • 7 tweets • 3 min read
Can models understand each other's reasoning? 🤔
When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?
Our new preprint with @davidbau and @csinva explores CoT generalizability 🧵👇
(1/7)
Why does this matter?
Faithfulness research (such as @AnthropicAI's "Reasoning Models Don't Always Say What They Think" and @Zidi's work) shows CoT doesn't always reflect internal reasoning. Models may hide hints and demonstrate selective faithfulness to intermediate steps.
What if explanations serve a different purpose?
If Model B follows Model A's reasoning to the same conclusion, maybe these explanations capture something generalizable (even if they are not perfectly "faithful" to either model's internals).