Goodfire Profile picture
May 7 8 tweets 3 min read Read on X
Neural networks might speak English, but they think in shapes.

Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision.

Starting today, we’re releasing a series of posts on this research agenda. 🧵
Just as the real world is highly structured, neural networks are full of rich geometric structure: time, space, numbers, color, the tree of life, new biomarkers, and more are represented along curved paths and surfaces.

This is true across models, modalities, and domains! (2/8)
New methods to understand this “neural geometry” are a crucial frontier in understanding, improving, and controlling models. (3/8)
Why? Just as you couldn’t understand a computer without understanding its data structures, you can't understand a neural network without knowing how its representations are shaped.

Representations underlie internal algorithms and model behavior! (4/8)
A simple example: days of the week, which lie on a circular path in models’ activations.

Steering linearly from Monday to Friday gets you incoherent outputs in between. Steering along the circular manifold means you cleanly shift from Mon → Tues → Wed → Thurs → Fri. (5/8)
Another example: an image-action world model of the “mountain car”.

Position turns out to be represented by a spaghetti-like path in activations. While steering along the manifold moves the car neatly (left), linear steering smears and teleports it incoherently (middle). (6/8)
In contrast to this view, popular interpretability methods like SAEs tend to “shatter” concept manifolds into many small and apparently-unrelated pieces, obscuring the overarching semantic structure that becomes clear when a manifold is viewed as a whole. (7/8) Image
Image
Read the first 2 posts in the series:

Forthcoming posts will go into more detail on:
- an example mechanism that operates on manifolds
- unsupervised discovery of manifolds + the connection to SAE features
- in-context geometrygoodfire.ai/research/the-w…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Goodfire

Goodfire Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GoodfireAI

Apr 30
Introducing Silico: the platform for building AI models with the precision of written software.

Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up.

Early access is open now. 🧵(1/10)
We’ve used interpretability to discover a novel class of Alzheimer’s biomarkers, teach a language model to correct its own hallucinations, and diagnose performance bottlenecks in a robotics model.

Silico brings those frontier techniques to everyone. (2/10)
Silico introduces our model neuroscientist: an autonomous agent that plans and runs concurrent experiments on your model.

It works with your team in our model design environment, where you can organize research threads, replicate and extend papers, and collaborate on findings.

Here are 5 things you can do with Silico:
(3/10)
Read 10 tweets
Apr 14
We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic.

We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8) Image
The core challenge of genomic medicine is figuring out what genetic variants actually do - so we can diagnose and treat resulting diseases. But many of the millions of variants in clinical databases are still “variants of uncertain significance” (VUS). (2/8)
By training our new covariance probes on @arcinstitute’s Evo 2 - a genomic foundation model trained on massive DNA data - we achieve state-of-the-art prediction of whether variants cause disease, with strong generalization across variant types. (3/8)
Read 9 tweets
Jan 28
We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente.

How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6) Image
Bio foundation models (e.g. AlphaFold) can achieve superhuman performance, so they must contain novel scientific knowledge. @PrimaMente's Pleiades epigenetics model is one such case - it's SOTA on early Alzheimer's detection.

But that knowledge is locked inside a black box (2/6)
Interpretability is the key to unlock that knowledge, extracting what the Pleiades model knows about epigenetics and Alzheimer's (or anything else!)

It's the missing step between black-box predictive power and true scientific understanding. (3/6)
Read 6 tweets
Nov 11, 2025
New research: are prompting and activation steering just two sides of the same coin?

@EricBigelow @danielwurgaft @EkdeepL and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4) Image
The paper formalizes a Bayesian framework for model control: altering a model's "beliefs" over which persona or data source it's emulating.

Context (prompting) and internal representations (steering) offer dual mechanisms to alter those "beliefs". (2/4)
This explains many-shot jailbreaking - sufficient context will overcome even a strong prior, and we can predict exactly when that will happen!

It also lets us:
- understand the additive effect of ICL & steering
- estimate a model's prior for any given persona (3/4) Image
Read 4 tweets
Nov 6, 2025
LLMs memorize a lot of training data, but memorization is poorly understood.

Where does it live inside models? How is it stored? How much is it involved in different tasks?

@jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7) Image
The method is like PCA, but for loss curvature instead of variance: it decomposes weight matrices into components ordered by curvature, and removes the long tail of low-curvature ones.

What's left are the weights that most affect loss across the training set. (2/7)
Applying the method significantly reduces verbatim recitation while keeping outputs coherent, without needing a targeted "forget set".

The result holds across architectures & modalities (LLM & ViT)! (3/7) Image
Read 7 tweets
Oct 29, 2025
Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper?

Our new research with @RakutenGroup on PII detection finds that SAE probes:
- transfer from synthetic to real data better than normal probes
- match GPT-5 Mini performance at 1/15 the cost

(1/6) Image
PII detection in production AI systems requires methods which are very lightweight, have high recall, and perform well after training on only synthetic data (can't train on customer PII!)

These constraints mean many approaches don't work well. (2/6)
But probes on a small sidecar model are a perfect candidate, and we tested several probe variants trained on synthetic data.

Surprisingly, a random forest SAE probe performs the best by far on prod test data - leading Rakuten to deploy it to their agent platform. (3/6) Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(