Goodfire Profile picture
Using interpretability to understand, learn from, and design AI.

May 7, 8 tweets

Neural networks might speak English, but they think in shapes.

Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision.

Starting today, we’re releasing a series of posts on this research agenda. 🧵

Just as the real world is highly structured, neural networks are full of rich geometric structure: time, space, numbers, color, the tree of life, new biomarkers, and more are represented along curved paths and surfaces.

This is true across models, modalities, and domains! (2/8)

New methods to understand this “neural geometry” are a crucial frontier in understanding, improving, and controlling models. (3/8)

Why? Just as you couldn’t understand a computer without understanding its data structures, you can't understand a neural network without knowing how its representations are shaped.

Representations underlie internal algorithms and model behavior! (4/8)

A simple example: days of the week, which lie on a circular path in models’ activations.

Steering linearly from Monday to Friday gets you incoherent outputs in between. Steering along the circular manifold means you cleanly shift from Mon → Tues → Wed → Thurs → Fri. (5/8)

Another example: an image-action world model of the “mountain car”.

Position turns out to be represented by a spaghetti-like path in activations. While steering along the manifold moves the car neatly (left), linear steering smears and teleports it incoherently (middle). (6/8)

In contrast to this view, popular interpretability methods like SAEs tend to “shatter” concept manifolds into many small and apparently-unrelated pieces, obscuring the overarching semantic structure that becomes clear when a manifold is viewed as a whole. (7/8)

Read the first 2 posts in the series:

Forthcoming posts will go into more detail on:
- an example mechanism that operates on manifolds
- unsupervised discovery of manifolds + the connection to SAE features
- in-context geometrygoodfire.ai/research/the-w…

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling