Luisa Zintgraf Profile picture
Senior Research Scientist in the RL team @googledeepmind. PhD from @UniofOxford.
Nov 4 7 tweets 2 min read
Excited to share our new paper, "DataRater: Meta-Learned Dataset Curation"!

We explore a fundamental question: How can we *automatically* learn which data is most valuable for training foundation models?

Paper: to appear @NeurIPSConf

Thread 👇arxiv.org/pdf/2505.17895 Foundation models are trained on large datasets, but not all data is created equal. Dataset curation often relies on manual, coarse-grained filtering and hand-crafted rules. This is becoming a major challenge, especially with the rise of synthetic data.