Latest Twitter Threads by @EkdeepL on Thread Reader App

Jun 28 • 16 tweets • 5 min read

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient?

Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵

1/

We first define Bayesian predictors for ICL settings that involve learning a finite mixture of tasks:

🔴 Memorizing (M): discrete prior on seen tasks
🔵 Generalizing (G): continuous prior matching the true task distribution

These match known strategies from prior work!

2/

Jun 6 • 11 tweets • 4 min read

🚨 New paper alert!

Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔

1/11

We propose to stress-test SAEs by formalizing LRH and a specific concept structure that lies outside this interpretation: hierarchical concepts that are not linear accessible!

2/11

Nov 10, 2024 • 11 tweets • 5 min read

Paper alert—accepted as a NeurIPS *Spotlight*!🧵👇

We build on our past work relating emergence to task compositionality and analyze the *learning dynamics* of such tasks: we find there exist latent interventions that can elicit them much before input prompting works! 🤯

We first define “concept space”, a coordinate space whose axes denote specific concepts (e.g., color, size). We train diffusion models on a set of concept combinations & map generations of seen / unseen combinations back to the concept space, yielding *concept learning dynamics*!

Share this page!

Enter URL or ID to Unroll