Kevin Lu Profile picture
Jul 9 6 tweets 5 min read Read on X
Why you should stop working on RL research and instead work on product //
The technology that unlocked the big scaling shift in AI is the internet, not transformers

I think it's well known that data is the most important thing in AI, and also that researchers choose not to work on it anyway. ... What does it mean to work on data (in a scalable way)?

The internet provided a rich source of abundant data, that was diverse, provided a natural curriculum, represented the competencies people actually care about, and was an economically viable technology to deploy at scale -- it became the perfect complement to next-token prediction and was the primordial soup for AI to take off.

Without transformers, any number of approaches could have taken off, we could probably have CNNs or state space models at the level of GPT-4.5. But there hasn't been a dramatic improvement in base models since GPT-4. Reasoning models are great in narrow domains, but not as huge of a leap as GPT-4 was in March 2023 (over 2 years ago...)

We have something great with reinforcement learning, but my deep fear is that we will repeat the mistakes of the past (2015-2020 era RL) and do RL research that doesn't matter.

In the way the internet was the dual of supervised pretraining, what will be the dual of RL that will lead to a massive advancement like GPT-1 -> GPT-4? I think it looks like research-product co-design.Image
I really like this diagram from @_jasonwei and @hwchung27 about how to view the bitter lesson:

It's a mistake not to add structure now, it's a mistake to not remove that structure later.

We're at the precipice of setting up a huge, powerful RL training run that will define the next five years.

It's a mistake to think that just because manipulating little mathematical pieces of the RL algorithm got us to where we are now, means that continuing to play with the RL objective will get us to where we need to be next.

x.com/_jasonwei/stat…
Ultimately, we want AGI that benefits and interacts with humans, not just something that lives in a toy cage (like AlphaZero for chess, or reasoning models in math). In contrast to other researchers, I think it is therefore imperative to work on product.

Are researchers who say "we'll have AGI if we can do AIME, Codeforces, Pokemon" right? Once we've solved Humanity's Last Exam (probably in 6 months), will that be AGI?

Or is it more correct to say that we'll have AGI once consumers feel that there is a technology whereby economical and ready access to intelligence changes their lives, and they can't live without it? ... If that's what we care about, shouldn't we optimize for it?Image
The internet is incredibly diverse, and it is sourced from data on topics which... humans actually cared about to engage with in the first place. There are low-resource languages and niche fanbases that will be forever immortalized in AGI because someone cared enough to document them. We should scale the data collection pipeline!

I like this paper from @ke_li_2021 about how diversity is harder to reason about than you might think.

More broadly, I think we will soon be at a place where we can reconsider what "bad data" is -- the models will be smart enough to post-process the data themselves before training on them, so we should upweight the value of new information, rather than "pristine" data in the old-school sense of imitation learning.

x.com/ke_li_2021/sta…
The internet sets up a natural curriculum of skills -- people gradually add new ideas on top of the old ones -- ensuring the models have a smooth difficulty ramp to learn from that covers the skill space.

Curriculum will be important for RL -- you need to learn sub-skills before you can discover more complex results -- and in fact the internet has already been used for RL, in self-play video games where agents learn by playing against humans in zero-sum games like Starcraft or Rocket League.

But creating the curriculum is incredibly hard. We go from AIME, to USAMO, to IMO, to FrontierMath, ... and for what? We have a model that is good at FrontierMath now? Researchers handcrafting these curriculums and tasks for RL doesn't scale.Image
I dream of having a rich set of economically valuable RL tasks to train on, as wide and beautiful as the internet

Does this pipeline for task creation look like... robotics? trading? enterprise metrics? research? coding? recommendation? video games?

I think it's the absolute highest leverage thing you could be working on (and it requires next to no experience in RL theory).

link to blog: kevinlu.ai/the-only-impor…Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Kevin Lu

Kevin Lu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_kevinlu

Jun 30
So I think something else that doesn't get discussed much is the extrapolation of this inference : training trend

- 2015: back in the day, we would train one model per dataset, and inference it once (to obtain the eval result for our paper)
- 2020: with chatgpt, multi-task learning, and the productionisation of modern LLMs, we are spending far more compute to sample from the model than to train it
- 2025: as we continue this trend, we are seeing a world where training is trivially cheap -- players like DeepSeek training frontier models for $5M

I think an immediate read of this is that we are no longer interested in exploration in the weight space (with SGD optimization) and instead, we care about exploration in the semantic space of solutions (with chain of thought)

We used to be interested in things like how do we develop a faster SGD optimizer. How does diffusion affect training dynamics. But instead we should think about how diffusion affects inference dynamics

a lot of other interesting questions like:

1. how do we build infrastructure to prepare for this future? @PrimeIntellect is doing super cool things here
2. what will the LLM world look like when you don't need billions of $ to enter the model training game?
3. what do we (the AI community) need to do in order to continue this trend?
4. what product surface and usecases maximally benefit from this world?
5. what are the moats and flywheels that set you up for success here? (chatgpt is one of them)Image
more of a fun meme, but i think most people are also talking about inference compute, but not enough about inference time

generally in life, we pay extra for things to be fast. but right now the small-model-tier (flash, mini, nano, haiku) are both cheap and fast

it's easy to spend $10B to increase compute (largely parallel), but all the world's billionaires want to spend money to get more time, and we don't know how

i think there's going to be more thought about parallel vs sequential compute. if you have a latency sensitive usecase {eg: voice mode, cursor tab}, you are going to get your greatest returns by reducing the non-parallel part of the pipeline (HFT cares a lot about this). and you are going to spend a lot of money (not less) to optimize < milliseconds

setting a model to find a cure for cancer and then having it return to me in 100 years isn't going to help me / my friends / my familyImage
ok enough tweeting from me, but if you are interested in also some thoughts about how we can think of inference in ways that is not simply "make the cot longer", i am advertising my blog 🙂

kevinlu.ai/spending-infer…
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(