Uzay Macar Profile picture
Researcher and entrepreneur
Apr 14 17 tweets 6 min read
🧵New Anthropic Fellows research: We studied mechanisms of "introspective awareness" in LLMs.

LLMs can sometimes detect steering vectors injected into their residual stream. But is this worthy of being called introspection, or attributable to some uninteresting confound?👇 Image We use the setup from Lindsey (2025): inject a steering vector, then ask the model: "Do you detect an injected thought? [detection] If so, what is the injected thought about? [identification]"

Our experiments are on open-source 🤖: Gemma3-27B, OLMo-3.1-32B, and Qwen3-235B.