Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Rylan Schaeffer

Jul 23, 2022 • 15 tweets • 15 min read • Read on X

Scrolly

@AI_for_Science

If you’re interested in deep learning (DL) and neuroscience, come to our poster at @AI_for_Science’s #ICML2022 workshop

**No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit**

Joint w/ @KhonaMikail @FieteGroup 1/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup The central promise of DL-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using DL models of grid cells in the MEC-HPC circuit, that one often gets neither 2/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Prior work claims that training artificial networks (ANNs) on a path integration task generically creates grid cells (a). We empirically show and analytically explain why grid cells only emerge in a small subset of hyperparameter space chosen post-hoc by the programmer (b). 3/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 1: Of the >3500 networks we trained, 60% learned to accurately encode position but only 7% exhibited **possible** grid-like cells (the sweep was already biased in hyperparameters towards creating grid cells) 4/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 2: Grid cell emergence requires a highly specific supervised target encoding: Simple cartesian, radial spatial readouts never yielded grd cells, nor did Gaussian-shaped place cell-like readouts. Grid cell emergence required difference-of-softmaxed-Gaussian readouts 5/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 3: Artificial grid periods are set by a hyperparameter choice and so do not provide a fundamental prediction; multiple modules do not emerge. Over a wide sweep producing ideal grid units the grid period distr is unimodal in contrast with multiple periods in the brain 6/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 4: We can analytically explain why we observe these empirical results, using Fourier analysis of Turning instability similar to that in first-principles continuous attractor models 7/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 5: Grid unit emergence is highly sensitive to one hyperparameter -- the readout receptive field width -- and does not occur if the hyperparameter is changed by a tiny amount, e.g. 12 cm yields grid units, 11 cm and 13 cm do not 8/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 7: Grid cell emergence in prev publications also relies on a *critical but unstated* implementation detail. We use Fourier analysis and numerical simulations to explain why this particular and unusual implementation choice is necessary. 9/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Result 8: Artificial grid units disappear with more biologically realistic place cells. Adding a small amount of heterogeneity to place cell receptive fields causes grid cells to disappear 10/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Takeaway: It is highly improbable that a path integration objective for ANNs would have produced grid cells as a novel prediction, had grid cells not been known to exist. Thus, our results challenge the notion that DL offers a free lunch for Neuroscience 11/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Prospective Puzzle: ANN grid models have been claimed to explain variance in mouse MEC activity almost as well as variance explained by other mice. How are these networks able to predict mouse MEC neural activity so well? 12/13

@AI_for_Science

@AI_for_Science @KhonaMikail @FieteGroup Prospective Answer: Deep networks may appear to be better models of biological networks because they provide higher-dimensional bases than alternative models, and thus trivially achieve higher correlation scores for linear regression-based comparisons. 13/13

Link to the workshop submission paper: openreview.net/forum?id=mxi1x…

An updated version will soon be up on biorxiv.

Also, "Turning" above should be "Turing." Apologies for other typos

14/13

@tyrell_turing

@tyrell_turing @KordingLab would love to hear your thoughts and what you think about our conclusions

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @RylanSchaeffer

Rylan Schaeffer

@RylanSchaeffer

Apr 4

Interested in test time / inference scaling laws?

Then check out our newest preprint!!

📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉

w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 @_robertkirkarxiv.org/abs/2502.17578

@JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 Large Language Monkeys & Best of N Jailbreaking discovered a striking finding:

when language model tackle a suite of tasks with multiple attempts per task, then they exhibit power law scaling with the number of attempts!

But from where does this power law scaling emerge?

@JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 This finding is puzzling because a simple mathematical calculation predicts that the per-task pass rate should scale *exponentially* with the number of attempts, not *polynomially* as a power law

When we look at each problem / prompt / task, that's exactly what we find!

Read 12 tweets

Rylan Schaeffer

@RylanSchaeffer

Oct 23, 2024

📢New preprint📢

🔄Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

A deeper dive into the effects of self-generated synthetic data on model-data feedback loops

w/ @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo

1/9

The increasing presence of AI-generated content on internet raises critical question:

What happens when #GenerativeAI is pretrained on web-scale datasets containing data created by earlier models?

Many have prophesied that such models will progressively degrade - Model Collapse!

(fig. from @NaturePortfolio)

2/9

Contribution #1: The model collapse phenomenon studied by the @NaturePortfolio 2024 paper is attributable to deleting data en masse between model-fitting iterations (left).

If data instead accumulate over time, then model collapse is avoided

Multivariate Gaussian modeling:

3/9

Read 9 tweets

Rylan Schaeffer

@RylanSchaeffer

Oct 14, 2024

My 2nd to last #neuroscience paper will appear @unireps !!

🧠🧠 Maximizing Neural Regression Scores May Not Identify Good Models of the Brain 🧠🧠

w/ @KhonaMikail @neurostrow @BrandoHablando @sanmikoyejo

Answering a puzzle 2 years in the making

1/12openreview.net/forum?id=vbtj0…

Our story begins in 2014: An influential methodology in #neuroscience is pioneered by @dyamins & Jim DiCarlo, arguing that task-optimized deep networks should be considered good models of the brain if (linear) regressions predict biological population responses well

2/12

This neural regressions methodology becomes wildly popular in vision, audition, language

A NeurIPS 2021 Spotlight extend this regressions methodology to spatial navigation in medial entorhinal cortex (MEC)

They find certain deep networks are ✨amazing ✨ models of MEC

3/12

Read 12 tweets

Rylan Schaeffer

@RylanSchaeffer

Jul 26, 2024

https://twitter.com/RylanSchaeffer/status/1816535790534701304

Yesterday, I tweeted that model collapse appears when researchers intentionally induce it in ways that don't match what is done in practice

Let me explain using the Shumailov et al. @Nature 2024 paper's methodology as an example

Paper:

🧵⬇️

1/N nature.com/articles/s4158…

https://twitter.com/RylanSchaeffer/status/1816535790534701304

Model collapse arose from asking: what happens when synthetic data from previous generative models enters the pretraining data supply used to train new generative model?

I like Shumailov et al.'s phrasing:

"What happens to GPT generations GPT-{n} as n increases?"

2/N

https://twitter.com/NamanGoyal21/status/1815819622525870223

Let's identify realistic pretraining conditions for frontier AI models to make sure we study the correct setting

1. Amount of data: 📈 Llama went from 1.4T tokens to 2T tokens to 15T tokens

2. Amount of chips: 📈 Llama went from 2k to 4k to 16k GPUs

3/N

https://twitter.com/NamanGoyal21/status/1815819622525870223

Read 15 tweets

Rylan Schaeffer

@RylanSchaeffer

Jun 10, 2024

❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥

**Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?**

w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo

1/N arxiv.org/abs/2406.04391

Predictable behavior from scaling AI systems is desirable. While scaling laws are well established, how *specific* downstream capabilities scale is significantly muddier eg. @sy_gadre @lschmidt3 @ZhengxiaoD @jietang

Why?

2/N arxiv.org/abs/2403.08540
arxiv.org/abs/2403.15796

@sy_gadre @lschmidt3 @ZhengxiaoD @jietang We identify a new factor for widely-used multiple choice QA benchmarks e.g. MMLU:

Downstream performance is computed from negative log likelihoods via a sequence of transformations that progressively deteriorate the statistical relationship between performance and scale

3/N

Read 11 tweets

Rylan Schaeffer

@RylanSchaeffer

May 1, 2024

What happens when generative models are trained on their own outputs?

Prior works foretold of a catastrophic feedback loop, a curse of recursion, descending into madness as models consume their own outputs. Are we poisoning the very data necessary to train future models?

1/N

Excited to announce our newest preprint!

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

w/ @MGerstgrasser @ApratimDey2 @rm_rafailov @sanmikoyejo @danintheory @Andr3yGR @Diyi_Yang David Donoho

2/Narxiv.org/abs/2404.01413

@MGerstgrasser @ApratimDey2 @rm_rafailov @sanmikoyejo @danintheory @Andr3yGR @Diyi_Yang Many prior works consider training models solely on data generated by the preceding model i.e. data are replaced at each model-fitting iteration. Replacing data leads to collapse, but isn’t done in practice.

What happens if data instead accumulate across each iteration?

3/N

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Rylan Schaeffer

Try unrolling a thread yourself!

More from @RylanSchaeffer

Rylan Schaeffer

Rylan Schaeffer

Rylan Schaeffer

Rylan Schaeffer

Rylan Schaeffer

Rylan Schaeffer

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!