Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Goodfire

@GoodfireAI

May 7 • 8 tweets • 3 min read • Read on X

Scrolly

Neural networks might speak English, but they think in shapes.

Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision.

Starting today, we’re releasing a series of posts on this research agenda. 🧵

Just as the real world is highly structured, neural networks are full of rich geometric structure: time, space, numbers, color, the tree of life, new biomarkers, and more are represented along curved paths and surfaces.

This is true across models, modalities, and domains! (2/8)

New methods to understand this “neural geometry” are a crucial frontier in understanding, improving, and controlling models. (3/8)

Why? Just as you couldn’t understand a computer without understanding its data structures, you can't understand a neural network without knowing how its representations are shaped.

Representations underlie internal algorithms and model behavior! (4/8)

A simple example: days of the week, which lie on a circular path in models’ activations.

Steering linearly from Monday to Friday gets you incoherent outputs in between. Steering along the circular manifold means you cleanly shift from Mon → Tues → Wed → Thurs → Fri. (5/8)

Another example: an image-action world model of the “mountain car”.

Position turns out to be represented by a spaghetti-like path in activations. While steering along the manifold moves the car neatly (left), linear steering smears and teleports it incoherently (middle). (6/8)

In contrast to this view, popular interpretability methods like SAEs tend to “shatter” concept manifolds into many small and apparently-unrelated pieces, obscuring the overarching semantic structure that becomes clear when a manifold is viewed as a whole. (7/8)

Read the first 2 posts in the series:

Forthcoming posts will go into more detail on:
- an example mechanism that operates on manifolds
- unsupervised discovery of manifolds + the connection to SAE features
- in-context geometrygoodfire.ai/research/the-w…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @GoodfireAI

Goodfire

@GoodfireAI

Apr 30

Introducing Silico: the platform for building AI models with the precision of written software.

Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up.

Early access is open now. 🧵(1/10)

We’ve used interpretability to discover a novel class of Alzheimer’s biomarkers, teach a language model to correct its own hallucinations, and diagnose performance bottlenecks in a robotics model.

Silico brings those frontier techniques to everyone. (2/10)

Silico introduces our model neuroscientist: an autonomous agent that plans and runs concurrent experiments on your model.

It works with your team in our model design environment, where you can organize research threads, replicate and extend papers, and collaborate on findings.

Here are 5 things you can do with Silico:
(3/10)

Read 10 tweets

Goodfire

@GoodfireAI

Apr 14

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic.

We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

The core challenge of genomic medicine is figuring out what genetic variants actually do - so we can diagnose and treat resulting diseases. But many of the millions of variants in clinical databases are still “variants of uncertain significance” (VUS). (2/8)

By training our new covariance probes on @arcinstitute’s Evo 2 - a genomic foundation model trained on massive DNA data - we achieve state-of-the-art prediction of whether variants cause disease, with strong generalization across variant types. (3/8)

Read 9 tweets

Goodfire

@GoodfireAI

Jan 28

We've identified a novel class of biomarkers for Alzheimer's detection - using interpretability - with @PrimaMente.

How we did it, and how interpretability can power scientific discovery in the age of digital biology: (1/6)

Bio foundation models (e.g. AlphaFold) can achieve superhuman performance, so they must contain novel scientific knowledge. @PrimaMente's Pleiades epigenetics model is one such case - it's SOTA on early Alzheimer's detection.

But that knowledge is locked inside a black box (2/6)

Interpretability is the key to unlock that knowledge, extracting what the Pleiades model knows about epigenetics and Alzheimer's (or anything else!)

It's the missing step between black-box predictive power and true scientific understanding. (3/6)

Read 6 tweets

Goodfire

@GoodfireAI

Nov 11, 2025

New research: are prompting and activation steering just two sides of the same coin?

@EricBigelow @danielwurgaft @EkdeepL and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)

https://twitter.com/67502027/status/1988314933248024801

The paper formalizes a Bayesian framework for model control: altering a model's "beliefs" over which persona or data source it's emulating.

Context (prompting) and internal representations (steering) offer dual mechanisms to alter those "beliefs". (2/4)

https://twitter.com/67502027/status/1988314933248024801

This explains many-shot jailbreaking - sufficient context will overcome even a strong prior, and we can predict exactly when that will happen!

It also lets us:
- understand the additive effect of ICL & steering
- estimate a model's prior for any given persona (3/4)

Read 4 tweets

Goodfire

@GoodfireAI

Nov 6, 2025

LLMs memorize a lot of training data, but memorization is poorly understood.

Where does it live inside models? How is it stored? How much is it involved in different tasks?

@jack_merullo_ & @srihita_raju's new paper examines all of these questions using loss curvature! (1/7)

https://twitter.com/1547342006128586752/status/1986493472988356920

The method is like PCA, but for loss curvature instead of variance: it decomposes weight matrices into components ordered by curvature, and removes the long tail of low-curvature ones.

What's left are the weights that most affect loss across the training set. (2/7)

https://twitter.com/1547342006128586752/status/1986493472988356920

Applying the method significantly reduces verbatim recitation while keeping outputs coherent, without needing a targeted "forget set".

The result holds across architectures & modalities (LLM & ViT)! (3/7)

Read 7 tweets

Goodfire

@GoodfireAI

Oct 29, 2025

Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper?

Our new research with @RakutenGroup on PII detection finds that SAE probes:
- transfer from synthetic to real data better than normal probes
- match GPT-5 Mini performance at 1/15 the cost

(1/6)

PII detection in production AI systems requires methods which are very lightweight, have high recall, and perform well after training on only synthetic data (can't train on customer PII!)

These constraints mean many approaches don't work well. (2/6)

But probes on a small sidecar model are a perfect candidate, and we tested several probe variants trained on synthetic data.

Surprisingly, a random forest SAE probe performs the best by far on prod test data - leading Rakuten to deploy it to their agent platform. (3/6)

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Goodfire

Try unrolling a thread yourself!

More from @GoodfireAI

Goodfire

Goodfire

Goodfire

Goodfire

Goodfire

Goodfire

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!