Latest Twitter Threads by @nikhil07prakash on Thread Reader App

Jun 24 • 15 tweets • 6 min read

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it relies heavily on concepts similar to pointer variables in C programming!

Since Theory of Mind (ToM) is fundamental to social intelligence numerous works have benchmarked this capability of LMs. However, the internal mechanics responsible for solving (or failing to solve) such tasks remains unexplored...

Share this page!

Enter URL or ID to Unroll