Surya Ganguli Profile picture
Associate Prof of Applied Physics @Stanford, and departments of Computer Science, Electrical Engineering and Neurobiology. Venture Partner @a16z
Dec 31, 2024 5 tweets 2 min read
Our new paper! "Analytic theory of creativity in convolutional diffusion models" lead expertly by @MasonKamb
Our closed-form theory needs no training, is mechanistically interpretable & accurately predicts diffusion model outputs with high median r^2~0.9 arxiv.org/abs/2412.20292Image Moreover, it explains how creative new diffusion model outputs, far from the training data, are constructed by mixing and matching different local training set image patches at different locations in the new output, yielding a local patch mosaic model of creativity. Image
Aug 14, 2023 10 tweets 4 min read
1/Our paper @NeuroCellPress "Interpreting the retinal code for natural scenes" develops explainable AI (#XAI) to derive a SOTA deep network model of the retina and *understand* how this net captures natural scenes plus 8 seminal experiments over >2 decades https://t.co/4Hy1tfNsHtsciencedirect.com/science/articl…
Image 2/#XAI will become increasingly important in #neuroscience as deep learning allows us to derive highly accurate but complex models of biological circuits.But will we just be replacing something we don't understand-the brain-with something else we don't understand-our model of it? Image
Jul 17, 2023 5 tweets 2 min read
1/ Our new paper lead by @AllanRaventos @mansiege , @FCHEN_AI asks when in-context learning of regression can solve fundamentally *new* problems *not* seen during pre-training, and reveals it as an emergent capability arising from a phase transition... arxiv.org/abs/2306.15063 2/ between two computational phases as one increases the diversity of the pre-training tasks. At low task diversity, transformers learn in-context like a Bayesian that memorizes only tasks seen during pre-training and cannot solve new tasks....
Nov 16, 2022 11 tweets 4 min read
1/ Our new preprint biorxiv.org/content/10.110… on when grid cells appear in trained path integrators w/ Sorscher @meldefon @aran_nayebi @lisa_giocomo @dyamins critically assesses claims made in a #NeurIPS2022 paper described below. Several corrections in our thread -> 2/ Our prior theory authors.elsevier.com/c/1f~Ze3BtfH1Z… quantitatively explains why few hexagonal grid cells were found in the work; many choices were made which prior theory proved don’t lead to hexagonal grids; when 2 well understood choices are made grids appear robustly ~100% of the time
Jun 30, 2022 8 tweets 4 min read
1/Is scale all you need for AGI?(unlikely).But our new paper "Beyond neural scaling laws:beating power law scaling via data pruning" shows how to achieve much superior exponential decay of error with dataset size rather than slow power law neural scaling arxiv.org/abs/2206.14486 2/ In joint work @MetaAI w/Ben Sorscher, Robert Geirhos, Shashank Shekhar & @arimorcos we show both in theory (via statistical mechanics) and practice how to achieve exponential scaling by only learning on selected data subsets of difficult nonredundant examples(defined properly)
Jul 16, 2021 6 tweets 3 min read
1/ Our new work: "How many degrees of freedom do we need to train deep networks: a loss landscape perspective." arxiv.org/abs/2107.05802 We present a geometric theory that connects to lottery tickets and a new method: lottery subspaces. w/ @_BrettLarsen @caenopy @stanislavfort 2/ Many methods can train to low loss using very few degrees of freedom (DoF). But why? We show that to train to a small loss L using a small number of random DoF, the number of DoF + the Gaussian width of the loss sublevel set projected onto a sphere around initialization...
Feb 4, 2021 6 tweets 3 min read
1/ Super excited to share our work with @drfeifei and @silviocinguetta, lead by the mastermind @agrimgupta92 on Deep Evolutionary Reinforcement Learning (DERL): arxiv.org/abs/2102.02202 which leverages large scale simulations of evolution and learning to... 2/ generate diverse morphologies with embodied intelligence that can exploit the passive physical dynamics of agent environment interactions to rapidly learn complex tasks in an energy efficient manner
Mar 20, 2020 18 tweets 6 min read
1/ New paper in @Nature : “Fundamental bounds on the fidelity of sensory cortical coding” with amazing colleagues: Oleg Rumyantsev, Jérôme Lecoq, Oscar Hernandez, Yanping Zhang, Joan Savall, Radosław Chrapkiewicz, Jane Li, Hongkui Zheng, Mark Schnitzer: nature.com/articles/s4158… 2/ See also here for a free version: rdcu.be/b26wp and tweeprint below ->
Jul 18, 2019 6 tweets 4 min read
1/ New in @sciencemagazine w/ @KarlDeisseroth lab: science.sciencemag.org/content/early/…: new opsin + multi-photon holography to image ~4000 cells in 3D volumes over 5 cortical layers while also stimulating ~50 neurons to directly drive visual percepts; data analysis and theory reveal… 2/ that visual cortex operates in a highly sensitive critically excitable regime in which stimulating a tiny subset of ~20 cells with similar orientation tuning is sufficient to both selectively recruit a large fraction of similarly responding cells and drive a specific percept
Nov 29, 2018 11 tweets 6 min read
1/ Our new #neuroscience paper, "Emergent elasticity in the neural code for space" just appeared in @PNASNews: pnas.org/content/early/… Awesome work lead by @SamOcko, with @kiahhardcastle and @lisa_giocomo . Take home messages... 2/ We ask: how do we learn where we are? two info sources are needed: 1) our recent history of velocity; 2) what landmarks we have encountered. How can neurons/synapses fuse these two sources to build a consistent spatial map as we explore a new place we have never seen before?
Oct 28, 2018 11 tweets 3 min read
1/ New #deeplearning paper at the intersection of #AI #mathematics #psychology and #neuroscience: A mathematical theory of semantic development in deep neural networks: arxiv.org/abs/1810.10531 Thanks to awesome collaborators Andrew Saxe and Jay McClelland! Image 2/ We study how many phenomena in human semantic cognition arise in deep neural networks, and how these phenomena can be understood analytically in a simple deep linear network. Such phenomena include…