Jinjie Ni Profile picture
AI researcher building foundation models
Aug 9 19 tweets 8 min read
Token crisis: solved. ✅

We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs.

Findings:
> DLMs beat AR when tokens are limited, with >3× data potential.
> A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks.
> No saturation: more repeats = more gains.

🚨 ”x.openreview.net
We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review!

🔗 Blog & details:
jinjieni.notion.site/Diffusion-Lang…

18 🧵s ahead:Image 🧵 1/18

Diffusion language models are super data learners. 🧬

We pre-trained a series of DLMs from scratch for up to 8B parameters and 480B tokens.

It provides compelling evidence that, by repeating on normal web data, DLMs outperform AR counterparts across model sizes in data-constrained settings, demonstrating significantly greater potential without encountering performance saturation.

Overall, our results suggest DLMs exhibit more than threefold greater ultimate data potential compared to AR models.

Witness the intelligence crossovers 👇Image