Andrew Lampinen Profile picture
Interested in cognition and artificial intelligence. Research Scientist @DeepMind. Previously cognitive science @StanfordPsych. Tweets are mine.
Jan 29 10 tweets 3 min read
New paper studying how language models representations of things like factuality evolve over a conversation. We find that in edge case conversations, e.g. about model consciousness or delusional content, model representations can change dramatically! 1/ Image We identify dimensions that separate factual from non-factual answers via regression, with factual questions that deconfound factuality from answer biases or behavior. We test on held-out questions, both generic ones and conversation-relevant ones, throughout a conversation. 2/
Sep 22, 2025 9 tweets 3 min read
Why does AI sometimes fail to generalize, and what might help? In a new paper, we highlight the latent learning gap — which unifies findings from language model weaknesses to agent navigation — and suggest that episodic memory complements parametric learning to bridge it. Thread: Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences Andrew Kyle Lampinen , Martin Engelcke , Yuxuan Li , Arslan Chaudhry and James L. McClelland; Google DeepMind We take inspiration from classic experiments on latent learning in animals, where the animals learn about information that is not useful at present, but that might be useful later — for example, learning the location of useful resources in passing. By contrast, 2/
May 2, 2025 11 tweets 4 min read
How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/ On the generalization of language models from in-context learning and finetuning: a controlled study Andrew K. Lampinen*,1, Arslan Chaudhry*,1, Stephanie C.Y. Chan*,1, Cody Wild1, Diane Wan 1 , Alex Ku1, Jörg Bornschein1, Razvan Pascanu1, Murray Shanahan1 and James L. McClelland1,2 *Equal contributions, 1Google DeepMind, 2Stanford University We use controlled experiments to explore the generalization of ICL and finetuning in data-matched settings; if we have some documents containing new knowledge, does the LM generalize better from finetuning on them, or just putting all of them in context? 2/ Overview: If we have a language model and a new dataset, we can incorporate it in two ways: either finetuning the model on the dataset, or just putting the dataset in the context of the original model. Which generalizes better to held out test questions?
Feb 28, 2025 8 tweets 3 min read
New preprint! In “Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior” we synthesize AI and cognitive science works into a perspective on pursuing generalizable understanding of cognition. Thread: Paper header: Naturalistic Computational Cognitive Science: Towards generalizable models and theories that capture the full range of natural behavior Wilka Carvalho & Andrew Lampinen In we lay out the “what, why, and how” of designing experimental paradigms, models, and theories that engage with more of the full range of naturalistic inputs, tasks, and behaviors over which a cognitive theory should generalize. arxiv.org/abs/2502.20349Naturalistic computational cognitive science: the what, why, and the how. What: Section 2: Develop learning-based models of intelligence that can predict human behavior on both simplified and naturalistic stimuli and tasks Why:  Section 3: Why increase the naturalism of our experimental paradigms? Section 4: Why learn models of intelligence? How: Section 5: How do we build generalizable models that scale to naturalistic settings? Section 6: How do we build and test theories with naturalistic tasks and models?
Dec 22, 2023 12 tweets 5 min read
Research in mechanistic interpretability and neuroscience often relies on interpreting internal representations to understand systems, or manipulating representations to improve models. I gave a talk at @unireps at NeurIPS on a few challenges for this area, summary thread: 1/ Slide: exciting recent results in representational alignment... but what does it all *mean*? Illustration: Figure from a recent survey paper: https://arxiv.org/abs/2310.13018 showing a 3 x 3 grid of illustrations from papers in cognitive science, neuroscience, and machine learning that used methods of measuring, bridging, or increasing representational alignment between different systems. Specifically, our goal (or at least mine) is to understand or improve a system’s computations; thus, these methods depend on the complex relationship between representation and computation. In the talk I highlighted a few complexities of this relationship: 2/ Slide: So what is the relationship between representation and computation? Illustration: array of papers on the topic, including Churland & Sejnowksi's Neural Representation and Neural Computation, and Brooks's Intelligence Without Representation. Relationship status: It's complicated (heart with question mark).
Aug 13, 2023 16 tweets 4 min read
The debate over AI capabilities often hinges on testing abilities that humans are presumed to have — reasoning, logic, systematic, compositional generalization, grammar, etc. But how reliably good are humans actually at these skills? 1/ Take logical reasoning. Many studies show that humans have surprising difficulty reasoning about simple logical rules; for example, <20% of subjects correctly find which examples would falsify a simple if-then rule in the Wason task (). 2/journals.sagepub.com/doi/abs/10.108…
Jun 17, 2023 15 tweets 6 min read
Why does asking language models to respond as an expert or think step-by-step improve answers? What does it have to do with conditional BC and role play? Thread on the power of conditional sequence modeling, and why evaluating LM capabilities or safely deploying them is hard: 1/ First, conditional BC: vanilla BC cannot learn from suboptimal data; adding it to training hurts performance. This is one reason (though not the only one) that RL generally outperforms BC: it can learn from mistakes by getting lower rewards. But there's a solution! 2/
May 26, 2023 11 tweets 5 min read
What can be learned about causality and experimentation from passive data? What could language models learn from simply passively imitating text? We explore these questions in our new paper: “Passive learning of active causal strategies in agents and language models”
Thread: 1/ Paper title and authors: Pa... We show formally and empirically that agents (such as LMs) trained solely via passive imitation, can acquire generalizable strategies for discovering and exploiting causal structures, as long as they can intervene at test time. arxiv.org/abs/2305.16183 2/
Mar 25, 2023 9 tweets 4 min read
Some reflections on our Symbolic Behaviour in AI paper, after two more years of rapid progress in the field: AI has continued to develop along the themes of the paper. For example, we highlighted the key role of emergence in intelligence, and there has been a rapidly growing emphasis on emergence in AI, e.g. arxiv.org/abs/2206.07682 The idea that aspects of intelligence are emergent is not ne
Mar 19, 2023 18 tweets 5 min read
The recent discussions of what language models can and can't accomplish highlight some important issues in how cognitive science, linguistics, etc. think about human capabilities or competencies, and thus how to test them in models. Thread: 1/ What does it mean to have a capability? Sometimes we think of it as something someone can do robustly (so it's not just luck doing it once), but of course everyone can make mistakes, so we wouldn't want to dismiss a capability from a single mistake. 2/ Slide: What does it mean to...
Feb 25, 2023 16 tweets 6 min read
What is emergence, and why is it of recent interest in AI, and long-standing interest in cognitive science? And why is this an exciting time for considering emergence across these fields? A thread: 1/ Emergence is the idea that a large system composed of many small parts can have fundamentally different properties than those parts do — or that“more is different” as Anderson described it (science.org/doi/10.1126/sc…). 2/
Feb 11, 2023 9 tweets 4 min read
Ted Chiang is a great writer, but this is not a great take and I'm disappointed to see it getting heavily praised. It's not in keeping with our scientific understanding of LMs or deep learning more generally. Thread: 1/n One important approach to the scientific study of complex phenomena like human intelligence or the behavior of language models is to create a simplified model which captures the key elements, while maintaining full control over the system, and study its behavior. 2/
Oct 28, 2022 12 tweets 5 min read
How should we compare the capabilities of language models and humans? Is the answer different for LMs than cognitive models? In arxiv.org/abs/2210.15303 I offer some thoughts, focusing on a case study of LM processing of recursively nested grammatical structures. Thread: 1/11 Center-embedded (recursively-nested) syntactic structures are tough, and central to classical theories of human syntactic processing and long-standing debates about neural networks & innateness. A recent work argues that LMs cannot handle these structures as well as humans. 2/11 The sentence "The actors that the mother near the stude
Sep 29, 2022 9 tweets 4 min read
I'm not a scaling maximalist, but it's surprising to me how many people are 1) interested in differences between human and artificial intelligence and 2) think scaling to improve performance means deep learning is doing something fundamentally wrong. 1/n Why? Because evolution made some pretty hefty tradeoffs in energy cost, soft/fractured skulls at birth, etc. in order to scale the human brain. As a consequence, we have perhaps ~1 quadrillion synapses (frontiersin.org/articles/10.33…). 2/
Jun 2, 2022 13 tweets 4 min read
It's often claimed that learning language alone can't lead to understanding because understanding requires relating language to external meaning. I'm all for grounding and social learning of language, but I think that argument is wrong, or at least uninteresting. 1/n Consider a robot (or brain) learning through interacting in the world. An analogous argument would suggest that robot actually cannot understand anything outside of the electrical signals it processes. This is tautologically true in some sense, because I could overwrite 2/n
Jan 25, 2022 5 tweets 2 min read
I enjoyed this paper, and I think it provides nice context our recent work on explanations, relations, and causality in RL. Short thread: Explanations inherently highlight generalizable causal structure (see e.g.), and so they are explicitly forward-looking. Correspondingly, we show that explanations can allow agents to generalize out-of-distribution from ambiguous, casually confounded experiences.
Dec 8, 2021 10 tweets 4 min read
Explanations play a critical role in human learning, particularly in challenging areas—abstractions, relations and causality. We show they can also help RL agents in "Tell me why!—Explanations support learning of relational and causal structure" (arxiv.org/abs/2112.03753). Thread: Human explanations directly highlight the abstract, causal structure of the world, and how it relates to a particular situation. Thus, explanations can help us to learn efficiently and generalize appropriately from limited experience. Could explanations also support RL? 2/9
Sep 13, 2021 4 tweets 2 min read
I'm pleased to share that we've open-sourced two environments and the hierarchical attention mechanism for our "Towards mental time travel: A hierarchical memory for RL" paper: github.com/deepmind/deepm…

Paper summary thread/links: The repository linked above contains 1) a JAX/Haiku implementation of the hierarchical attention module, and 2) an implementation of the Ballet environment, which requires recalling spatio-temporal events, and is surprisingly challenging.
Jun 1, 2021 10 tweets 3 min read
How can RL agents recall the past in detail, in order to behave appropriately in the present? In our new preprint "Towards mental time travel: A hierarchical memory for RL agents" (arxiv.org/abs/2105.14039) we propose a memory architecture that steps in this direction. We draw inspiration from the idea that human memory is like "mental time travel"—we can recall a specific event in the past, and relive it in some sequential detail, with relatively little interference from other events. This ability is key to our goal-directed use of memory.
Feb 9, 2021 7 tweets 2 min read
What are symbols? Where do symbols come from? What behaviors demonstrate the ability to engage with symbols? How do the answers to these questions impact AI research? We argue for a new perspective on these issues our preprint: arxiv.org/abs/2102.03406 Summary in thread: 1/6 We interpret symbols as entities whose meaning is established by convention. We therefore argue that an entity is a symbol *only* to a system that demonstrates active participation in a system of meaning by convention; that is, a system that exhibits symbolic behavior. 2/6
May 12, 2020 9 tweets 2 min read
How can deep learning models flexibly reuse their knowledge? How can they adapt to new tasks zero-shot, as humans can? In our new preprint (arxiv.org/pdf/2005.04318), we propose a new approach based on learning to transform task representations: meta-mapping. Preview in thread: Our approach can make drastic adaptations zero-shot, like switching from winning at (simplified) poker to trying to lose. It can allow a visual classification system to recognize new concepts, and can adapt a model-free reinforcement learning to new tasks, without data from them.