Latest Twitter Threads by @vedanglad on Thread Reader App

Jun 16 • 9 tweets • 3 min read

How well can you describe the feature selectivity of a vision neuron … with words? Interpretability has long borrowed from neuroscience — and maybe it can give back too! 🧵

1/ We study “digital twins” of macaque V1/V4 -- vision models trained to predict the activity of biological neurons in the primate visual cortex -- and use their outputs to study how the brain structures the world.

Jun 28, 2024 • 8 tweets • 3 min read

1/7 Wondered what happens when you permute the layers of a language model? In our recent paper with @tegmark, we swap and delete entire layers to understand how models perform inference - in doing so we see signs of four universal stages of inference!

2/7 🤯 Surprise finding: Models are extremely robust and retain 72-95% of the original model's prediction accuracy WITHOUT fine-tuning.

Share this page!

Enter URL or ID to Unroll