How well can you describe the feature selectivity of a vision neuron … with words? Interpretability has long borrowed from neuroscience — and maybe it can give back too! 🧵
1/ We study “digital twins” of macaque V1/V4 -- vision models trained to predict the activity of biological neurons in the primate visual cortex -- and use their outputs to study how the brain structures the world.
Jun 28, 2024 • 8 tweets • 3 min read
1/7 Wondered what happens when you permute the layers of a language model? In our recent paper with @tegmark, we swap and delete entire layers to understand how models perform inference - in doing so we see signs of four universal stages of inference! 2/7 🤯 Surprise finding: Models are extremely robust and retain 72-95% of the original model's prediction accuracy WITHOUT fine-tuning.