Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Surya Ganguli

@SuryaGanguli

Jun 30, 2022 • 8 tweets • 4 min read • Read on X

Scrolly

1/Is scale all you need for AGI?(unlikely).But our new paper "Beyond neural scaling laws:beating power law scaling via data pruning" shows how to achieve much superior exponential decay of error with dataset size rather than slow power law neural scaling arxiv.org/abs/2206.14486

@MetaAI

2/ In joint work @MetaAI w/Ben Sorscher, Robert Geirhos, Shashank Shekhar & @arimorcos we show both in theory (via statistical mechanics) and practice how to achieve exponential scaling by only learning on selected data subsets of difficult nonredundant examples(defined properly)

3/ Our statistical mechanics theory of data pruning makes several predictions - including the ability to beat power scaling - which we confirm in ResNets on various tasks (SVHN, CIFAR10, ImageNet) and Vision Transformers fined-tuned on CIFAR10

4/ Then focusing on ImageNet, we performed a large scale benchmarking study of 10 different data-pruning metrics that rank examples from easiest to hardest and tested their efficacy in pruning data to create small data subsets of only the hardest examples to train on

5/ We additionally developed a new unsupervised data pruning metric that does not even require labels, is easy to compute given a pre-trained foundation model, and that out performs all previous metrics on ImageNet, allowing us to train on ~75% of ImageNet without accuracy loss

6/ Overall this work suggests that our current ML practice of collecting large amounts of random data is highly inefficient, leading to huge redundancy in the data, which we show mathematically is the origin of very slow, unsustainable power law scaling of error with dataset size

7/ A better way forward might be the creation of foundation datasets: carefully curated subsets of small amounts of data that are capable of training highly accurate models using far less data than we currently use in our large randomly selected datasets (see discussion in paper)

8/ Indeed, the initial computational cost of creating a foundation dataset through data pruning can be amortized across efficiency gains in training
many downstream models, just as the initial cost of training foundation models is amortized across faster fine-tuning on many tasks

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SuryaGanguli

Surya Ganguli

@SuryaGanguli

Dec 31, 2024

Our new paper! "Analytic theory of creativity in convolutional diffusion models" lead expertly by @MasonKamb
Our closed-form theory needs no training, is mechanistically interpretable & accurately predicts diffusion model outputs with high median r^2~0.9 arxiv.org/abs/2412.20292

Moreover, it explains how creative new diffusion model outputs, far from the training data, are constructed by mixing and matching different local training set image patches at different locations in the new output, yielding a local patch mosaic model of creativity.

It also explains why diffusion models make mistakes in fine spatial features (fingers,limbs) due to excessive locality at late times in the reverse diffusion process. Trained diffusion models do this on FashionMNIST (3 limbed pants and shirts) and our theory reproduces it

Read 5 tweets

Surya Ganguli

@SuryaGanguli

Aug 14, 2023

1/Our paper @NeuroCellPress "Interpreting the retinal code for natural scenes" develops explainable AI (#XAI) to derive a SOTA deep network model of the retina and *understand* how this net captures natural scenes plus 8 seminal experiments over >2 decades https://t.co/4Hy1tfNsHtsciencedirect.com/science/articl…

2/#XAI will become increasingly important in #neuroscience as deep learning allows us to derive highly accurate but complex models of biological circuits.But will we just be replacing something we don't understand-the brain-with something else we don't understand-our model of it?

3/We addressed this issue in our retinal model that not only successfully predicted responses to natural scenes, but also had hidden units that behaved like retinal interneurons, and also captured 8 different classes of foundational experiments in vision science including...

Read 10 tweets

Surya Ganguli

@SuryaGanguli

Jul 17, 2023

1/ Our new paper lead by @AllanRaventos @mansiege , @FCHEN_AI asks when in-context learning of regression can solve fundamentally *new* problems *not* seen during pre-training, and reveals it as an emergent capability arising from a phase transition... arxiv.org/abs/2306.15063

2/ between two computational phases as one increases the diversity of the pre-training tasks. At low task diversity, transformers learn in-context like a Bayesian that memorizes only tasks seen during pre-training and cannot solve new tasks....

3/ But just above a pre-training task diversity threshold, transformers can suddenly solve fundamentally new tasks in-context that are *not* seen during pre-training. This task diversity threshold is remarkably low and scales mildly with dimension such that...

Read 5 tweets

Surya Ganguli

@SuryaGanguli

Nov 16, 2022

@meldefon

1/ Our new preprint biorxiv.org/content/10.110… on when grid cells appear in trained path integrators w/ Sorscher @meldefon @aran_nayebi @lisa_giocomo @dyamins critically assesses claims made in a #NeurIPS2022 paper described below. Several corrections in our thread ->

https://twitter.com/RylanSchaeffer/status/1587454396257796096

2/ Our prior theory authors.elsevier.com/c/1f~Ze3BtfH1Z… quantitatively explains why few hexagonal grid cells were found in the work; many choices were made which prior theory proved don’t lead to hexagonal grids; when 2 well understood choices are made grids appear robustly ~100% of the time

3/ Also corrections: (1) difference of Gaussian place cells do lead to hexagonal grids; (2) multiple bump place cells at one scale also; (3) hexagonal grids are robust to place cell scale; (4) Gaussian interactions can yield periodic patterns;

Read 11 tweets

Surya Ganguli

@SuryaGanguli

Jul 16, 2021

@_BrettLarsen

1/ Our new work: "How many degrees of freedom do we need to train deep networks: a loss landscape perspective." arxiv.org/abs/2107.05802 We present a geometric theory that connects to lottery tickets and a new method: lottery subspaces. w/ @_BrettLarsen @caenopy @stanislavfort

2/ Many methods can train to low loss using very few degrees of freedom (DoF). But why? We show that to train to a small loss L using a small number of random DoF, the number of DoF + the Gaussian width of the loss sublevel set projected onto a sphere around initialization...

3/ Must exceed the total number of parameters, leading to phase transitions in trainability, and suggests why pruning weights at init is harder than pruning later. We also provide methods to measure the high dimensional geometry of loss landscapes through tomographic slicing...

Read 6 tweets

Surya Ganguli

@SuryaGanguli

Feb 4, 2021

@drfeifei

1/ Super excited to share our work with @drfeifei and @silviocinguetta, lead by the mastermind @agrimgupta92 on Deep Evolutionary Reinforcement Learning (DERL): arxiv.org/abs/2102.02202 which leverages large scale simulations of evolution and learning to...

https://twitter.com/agrimgupta92/status/1357147957502513153

2/ generate diverse morphologies with embodied intelligence that can exploit the passive physical dynamics of agent environment interactions to rapidly learn complex tasks in an energy efficient manner

3/ We also obtain insights into the dynamics of morphological evolution - here is a lineage tree showing how our evolutionary dynamics can generate multiple diverse morphologies without sacrificing fitness

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Surya Ganguli

Try unrolling a thread yourself!

More from @SuryaGanguli

Surya Ganguli

Surya Ganguli

Surya Ganguli

Surya Ganguli

Surya Ganguli

Surya Ganguli

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!