Shauli Ravfogel Profile picture
https://t.co/wtyGNDXwxW Faculty fellow at NYU CDS. Previously: PhD @ BIU NLP
May 28 14 tweets 4 min read
1/ Can LLMs introspect, i.e., reason about their internal states? Recent work claims LLMs notice when their "thoughts" get tampered with, and can report their content. We looked closely and we think it's too early to say that. Work led by @shashwat_s19 , with @tallinzen and me. Image 2/ There are different notions of introspection in cognitive science and philosophy. We target a strong notion (consistent with the framing in some recent work), under which LLMs can access their own activations in a second order computation distinct from a simple forward pass.