Rylan Schaeffer Profile picture
CS PhD student with @sanmikoyejo at @stai_research @StanfordAILab. Currently: Intern on Gemini @GoogleAI . Previously: Intern on Llama @AIatMeta
Apr 4 12 tweets 4 min read
Interested in test time / inference scaling laws?

Then check out our newest preprint!!

📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉



w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 @_robertkirkarxiv.org/abs/2502.17578Image @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 Large Language Monkeys & Best of N Jailbreaking discovered a striking finding:

when language model tackle a suite of tasks with multiple attempts per task, then they exhibit power law scaling with the number of attempts!

But from where does this power law scaling emerge? Image
Oct 23, 2024 9 tweets 4 min read
📢New preprint📢

🔄Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

A deeper dive into the effects of self-generated synthetic data on model-data feedback loops

w/ @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo

1/9Image The increasing presence of AI-generated content on internet raises critical question:

What happens when #GenerativeAI is pretrained on web-scale datasets containing data created by earlier models?

Many have prophesied that such models will progressively degrade - Model Collapse!

(fig. from @NaturePortfolio)

2/9Image
Oct 14, 2024 12 tweets 4 min read
My 2nd to last #neuroscience paper will appear @unireps !!

🧠🧠 Maximizing Neural Regression Scores May Not Identify Good Models of the Brain 🧠🧠

w/ @KhonaMikail @neurostrow @BrandoHablando @sanmikoyejo

Answering a puzzle 2 years in the making



1/12openreview.net/forum?id=vbtj0… Our story begins in 2014: An influential methodology in #neuroscience is pioneered by @dyamins & Jim DiCarlo, arguing that task-optimized deep networks should be considered good models of the brain if (linear) regressions predict biological population responses well

2/12 Image
Jul 26, 2024 15 tweets 5 min read
Yesterday, I tweeted that model collapse appears when researchers intentionally induce it in ways that don't match what is done in practice

Let me explain using the Shumailov et al. @Nature 2024 paper's methodology as an example

Paper:

🧵⬇️

1/N nature.com/articles/s4158…
Model collapse arose from asking: what happens when synthetic data from previous generative models enters the pretraining data supply used to train new generative model?

I like Shumailov et al.'s phrasing:

"What happens to GPT generations GPT-{n} as n increases?"

2/N Image
Jun 10, 2024 11 tweets 5 min read
❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥

**Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?**

w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo



1/N arxiv.org/abs/2406.04391
Image Predictable behavior from scaling AI systems is desirable. While scaling laws are well established, how *specific* downstream capabilities scale is significantly muddier eg. @sy_gadre @lschmidt3 @ZhengxiaoD @jietang




Why?

2/N arxiv.org/abs/2403.08540
arxiv.org/abs/2403.15796

Image
Image
May 1, 2024 13 tweets 5 min read
What happens when generative models are trained on their own outputs?

Prior works foretold of a catastrophic feedback loop, a curse of recursion, descending into madness as models consume their own outputs. Are we poisoning the very data necessary to train future models?

1/N Image Excited to announce our newest preprint!

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

w/ @MGerstgrasser @ApratimDey2 @rm_rafailov @sanmikoyejo @danintheory @Andr3yGR @Diyi_Yang David Donoho



2/Narxiv.org/abs/2404.01413
Mar 28, 2023 8 tweets 4 min read
A few weeks ago, Stanford AI Alignment @SAIA_Alignment read @AnthropicAI 's "Superposition, Memorization, and Double Descent." Double descent is relatively easy to describe, but **why** does double descent occur?



1/8 transformer-circuits.pub/2023/toy-doubl…
Image @SAIA_Alignment @AnthropicAI Prior work answers why double descent occurs, but we wanted an intuitive explanation that doesn’t require RMT or stat mech. Our new preprint identifies, interprets the **3** necessary ingredients for double descent, using ordinary linear regression!



2/8 arxiv.org/abs/2303.14151
Image
Nov 1, 2022 16 tweets 14 min read
Very excited to announce our #NeurIPS2022 paper No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit.

It's a story about NeuroAI, told through a story about grid & place cells.

Joint w/ @KhonaMikail @FieteGroup 1/15 @KhonaMikail @FieteGroup The promises of deep learning-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using deep network models of the MEC-HPC circuit, that one may get neither! 2/15
Jul 23, 2022 15 tweets 15 min read
If you’re interested in deep learning (DL) and neuroscience, come to our poster at @AI_for_Science’s #ICML2022 workshop

**No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit**

Joint w/ @KhonaMikail @FieteGroup 1/13 Image @AI_for_Science @KhonaMikail @FieteGroup The central promise of DL-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using DL models of grid cells in the MEC-HPC circuit, that one often gets neither 2/13