Latest Twitter Threads by @ravfogel on Thread Reader App

May 28 • 14 tweets • 4 min read

1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me.

2/ There are different notions of introspection in cognitive science and philosophy. We target a strong notion (consistent with the framing in some recent work), under which LLMs can access their own activations in a second order computation distinct from a simple forward pass.

Share this page!

Enter URL or ID to Unroll