Latest Twitter Threads by @luisa_zintgraf on Thread Reader App

Nov 4, 2025 • 7 tweets • 2 min read

Excited to share our new paper, "DataRater: Meta-Learned Dataset Curation"!

We explore a fundamental question: How can we *automatically* learn which data is most valuable for training foundation models?

Paper: to appear @NeurIPSConf

Thread 👇arxiv.org/pdf/2505.17895 Foundation models are trained on large datasets, but not all data is created equal. Dataset curation often relies on manual, coarse-grained filtering and hand-crafted rules. This is becoming a major challenge, especially with the rise of synthetic data.

Share this page!

Enter URL or ID to Unroll