Ekdeep Singh Profile picture
Postdoc at CBS-NTT Program on Physics of Intelligence, Harvard University.
Jun 28 16 tweets 5 min read
🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient?

Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵

1/ We first define Bayesian predictors for ICL settings that involve learning a finite mixture of tasks:

🔴 Memorizing (M): discrete prior on seen tasks
🔵 Generalizing (G): continuous prior matching the true task distribution

These match known strategies from prior work!

2/
Jun 6 11 tweets 4 min read
🚨 New paper alert!

Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔

1/11 We propose to stress-test SAEs by formalizing LRH and a specific concept structure that lies outside this interpretation: hierarchical concepts that are not linear accessible!

2/11 Image
Nov 10, 2024 11 tweets 5 min read
Paper alert—accepted as a NeurIPS *Spotlight*!🧵👇

We build on our past work relating emergence to task compositionality and analyze the *learning dynamics* of such tasks: we find there exist latent interventions that can elicit them much before input prompting works! 🤯 We first define “concept space”, a coordinate space whose axes denote specific concepts (e.g., color, size). We train diffusion models on a set of concept combinations & map generations of seen / unseen combinations back to the concept space, yielding *concept learning dynamics*! Image