Latest Twitter Threads by @RylanSchaeffer on Thread Reader App

Apr 4 • 12 tweets • 4 min read

Interested in test time / inference scaling laws?

Then check out our newest preprint!!

📉 How Do Large Language Monkeys Get Their Power (Laws)? 📉

w/ @JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 @_robertkirkarxiv.org/abs/2502.17578

@JoshuaK92829 @sanmikoyejo @Azaliamirh @jplhughes @jordanjuravsky @sprice354_ @aengus_lynch1 Large Language Monkeys & Best of N Jailbreaking discovered a striking finding:

when language model tackle a suite of tasks with multiple attempts per task, then they exhibit power law scaling with the number of attempts!

But from where does this power law scaling emerge?

Oct 23, 2024 • 9 tweets • 4 min read

📢New preprint📢

🔄Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

A deeper dive into the effects of self-generated synthetic data on model-data feedback loops

w/ @JoshuaK92829 @ApratimDey2 @MGerstgrasser @rm_rafailov @sanmikoyejo

1/9

The increasing presence of AI-generated content on internet raises critical question:

What happens when #GenerativeAI is pretrained on web-scale datasets containing data created by earlier models?

Many have prophesied that such models will progressively degrade - Model Collapse!

(fig. from @NaturePortfolio)

2/9

Oct 14, 2024 • 12 tweets • 4 min read

My 2nd to last #neuroscience paper will appear @unireps !!

🧠🧠 Maximizing Neural Regression Scores May Not Identify Good Models of the Brain 🧠🧠

w/ @KhonaMikail @neurostrow @BrandoHablando @sanmikoyejo

Answering a puzzle 2 years in the making

1/12openreview.net/forum?id=vbtj0… Our story begins in 2014: An influential methodology in #neuroscience is pioneered by @dyamins & Jim DiCarlo, arguing that task-optimized deep networks should be considered good models of the brain if (linear) regressions predict biological population responses well

2/12

Jul 26, 2024 • 15 tweets • 5 min read

Yesterday, I tweeted that model collapse appears when researchers intentionally induce it in ways that don't match what is done in practice

Let me explain using the Shumailov et al. @Nature 2024 paper's methodology as an example

Paper:

🧵⬇️

1/N nature.com/articles/s4158…

https://twitter.com/RylanSchaeffer/status/1816535790534701304

Model collapse arose from asking: what happens when synthetic data from previous generative models enters the pretraining data supply used to train new generative model?

I like Shumailov et al.'s phrasing:

"What happens to GPT generations GPT-{n} as n increases?"

2/N

Jun 10, 2024 • 11 tweets • 5 min read

❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥

**Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?**

w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo

1/N arxiv.org/abs/2406.04391

Predictable behavior from scaling AI systems is desirable. While scaling laws are well established, how *specific* downstream capabilities scale is significantly muddier eg. @sy_gadre @lschmidt3 @ZhengxiaoD @jietang

Why?

2/N arxiv.org/abs/2403.08540
arxiv.org/abs/2403.15796

May 1, 2024 • 13 tweets • 5 min read

What happens when generative models are trained on their own outputs?

Prior works foretold of a catastrophic feedback loop, a curse of recursion, descending into madness as models consume their own outputs. Are we poisoning the very data necessary to train future models?

1/N

Excited to announce our newest preprint!

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

w/ @MGerstgrasser @ApratimDey2 @rm_rafailov @sanmikoyejo @danintheory @Andr3yGR @Diyi_Yang David Donoho

2/Narxiv.org/abs/2404.01413

Mar 28, 2023 • 8 tweets • 4 min read

A few weeks ago, Stanford AI Alignment @SAIA_Alignment read @AnthropicAI 's "Superposition, Memorization, and Double Descent." Double descent is relatively easy to describe, but **why** does double descent occur?

1/8 transformer-circuits.pub/2023/toy-doubl…

@SAIA_Alignment @AnthropicAI Prior work answers why double descent occurs, but we wanted an intuitive explanation that doesn’t require RMT or stat mech. Our new preprint identifies, interprets the **3** necessary ingredients for double descent, using ordinary linear regression!

2/8 arxiv.org/abs/2303.14151

Nov 1, 2022 • 16 tweets • 14 min read

Very excited to announce our #NeurIPS2022 paper No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit.

It's a story about NeuroAI, told through a story about grid & place cells.

Joint w/ @KhonaMikail @FieteGroup 1/15

@KhonaMikail @FieteGroup The promises of deep learning-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using deep network models of the MEC-HPC circuit, that one may get neither! 2/15

Jul 23, 2022 • 15 tweets • 15 min read

If you’re interested in deep learning (DL) and neuroscience, come to our poster at @AI_for_Science’s #ICML2022 workshop

**No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit**

Joint w/ @KhonaMikail @FieteGroup 1/13

@AI_for_Science @KhonaMikail @FieteGroup The central promise of DL-based models of the brain are that they (1) shed light on the brain’s fundamental optimization problems/solutions, and/or (2) make novel predictions. We show, using DL models of grid cells in the MEC-HPC circuit, that one often gets neither 2/13

Share this page!

Enter URL or ID to Unroll