Rylan Schaeffer Profile picture
CS PhD student with @sanmikoyejo at @stai_research @StanfordAILab. Currently: Intern on Gemini @GoogleAI . Previously: Intern on Llama @AIatMeta
Oct 23 9 tweets 4 min read
📢New preprint📢

🔄Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

A deeper dive into the effects of self-generated synthetic data on model-data feedback loops

w/ @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo

1/9Image The increasing presence of AI-generated content on internet raises critical question:

What happens when #GenerativeAI is pretrained on web-scale datasets containing data created by earlier models?

Many have prophesied that such models will progressively degrade - Model Collapse!

(fig. from @NaturePortfolio)

2/9Image
Oct 14 12 tweets 4 min read
My 2nd to last #neuroscience paper will appear @unireps !!

🧠🧠 Maximizing Neural Regression Scores May Not Identify Good Models of the Brain 🧠🧠

w/ @KhonaMikail @neurostrow @BrandoHablando @sanmikoyejo

Answering a puzzle 2 years in the making



1/12openreview.net/forum?id=vbtj0… Our story begins in 2014: An influential methodology in #neuroscience is pioneered by @dyamins & Jim DiCarlo, arguing that task-optimized deep networks should be considered good models of the brain if (linear) regressions predict biological population responses well

2/12 Image
Jul 26 15 tweets 5 min read
Yesterday, I tweeted that model collapse appears when researchers intentionally induce it in ways that don't match what is done in practice

Let me explain using the Shumailov et al. @Nature 2024 paper's methodology as an example

Paper:

🧵⬇️

1/N nature.com/articles/s4158…
Model collapse arose from asking: what happens when synthetic data from previous generative models enters the pretraining data supply used to train new generative model?

I like Shumailov et al.'s phrasing:

"What happens to GPT generations GPT-{n} as n increases?"

2/N Image
Jun 10 11 tweets 5 min read
❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥

**Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?**

w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo



1/N arxiv.org/abs/2406.04391
Image Predictable behavior from scaling AI systems is desirable. While scaling laws are well established, how *specific* downstream capabilities scale is significantly muddier eg. @sy_gadre @lschmidt3 @ZhengxiaoD @jietang




Why?

2/N arxiv.org/abs/2403.08540
arxiv.org/abs/2403.15796

Image
Image
May 1 13 tweets 5 min read
What happens when generative models are trained on their own outputs?

Prior works foretold of a catastrophic feedback loop, a curse of recursion, descending into madness as models consume their own outputs. Are we poisoning the very data necessary to train future models?

1/N Image Excited to announce our newest preprint!

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

w/ @MGerstgrasser @ApratimDey2 @rm_rafailov @sanmikoyejo @danintheory @Andr3yGR @Diyi_Yang David Donoho



2/Narxiv.org/abs/2404.01413
Mar 28, 2023 8 tweets 4 min read
A few weeks ago, Stanford AI Alignment @SAIA_Alignment read @AnthropicAI 's "Superposition, Memorization, and Double Descent." Double descent is relatively easy to describe, but **why** does double descent occur?



1/8 transformer-circuits.pub/2023/toy-doubl…
Image @SAIA_Alignment @AnthropicAI Prior work answers why double descent occurs, but we wanted an intuitive explanation that doesn’t require RMT or stat mech. Our new preprint identifies, interprets the **3** necessary ingredients for double descent, using ordinary linear regression!



2/8 arxiv.org/abs/2303.14151
Image
Nov 1, 2022 16 tweets 14 min read
Very excited to announce our #NeurIPS2022 paper No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit.

It's a story about NeuroAI, told through a story about grid & place cells.

Joint w/ @KhonaMikail @FieteGroup 1/15 @KhonaMikail @FieteGroup The promises of deep learning-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using deep network models of the MEC-HPC circuit, that one may get neither! 2/15
Jul 23, 2022 15 tweets 15 min read
If you’re interested in deep learning (DL) and neuroscience, come to our poster at @AI_for_Science’s #ICML2022 workshop

**No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit**

Joint w/ @KhonaMikail @FieteGroup 1/13 Image @AI_for_Science @KhonaMikail @FieteGroup The central promise of DL-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using DL models of grid cells in the MEC-HPC circuit, that one often gets neither 2/13