Latest Twitter Threads by @thoma_gu on Thread Reader App

May 15 • 10 tweets • 4 min read

Can fast generative models still be likelihood-based?

Excited to share our new work @Apple MLR --Normalizing Trajectory Models

a step toward high-quality few-step generation with exact trajectory likelihood, powered by normalizing flows.

Paper:
[1/9]huggingface.co/papers/2605.08…

Diffusion and flow-matching models typically generate through many small steps, where simple denoising transitions are a reasonable approximation.

But when we compress generation into only a few coarse steps, the reverse transitions become much more complex.

[2/9]

May 11 • 10 tweets • 5 min read

Excited to share STARFlow2 from Apple MLR :
🥨Bridging Language Models and Normalizing Flows for Unified Multimodal Generation.

One model to understand, reason, and generate continuous images with a single unified autoregressive mechanism?

Paper:
1/9huggingface.co/papers/2605.08…

A core challenge in unified models is structural mismatch:
LMs decode text causally with KV-cache, while top image generators rely on iterative full-image denoising. This makes interleaved text-image generation unnatural and often requires re-encoding visual outputs. 2/9

Dec 12, 2025 • 11 tweets • 5 min read

(1/n) There’s a long-running debate on bringing representation learning into generative modeling—their latent spaces play different roles.

🚀🚀 We present FAE, a simple-yet-effective framework that bridges them with a single attention layer!

Paper: huggingface.co/papers/2512.07…

(2/n) Why it may be exciting?
🔸 ImageNet256 SOTA FID w/o CFG: 1.48/2.08 (800/80 epochs)
🔸 Near-SOTA FID w/ CFG: 1.29/1.70 (800/80 epochs)
🔸 Same latents work for both diffusion and NF models on ImageNet and T2I tasks;
🔸 Simple layer bridging spaces while preserving semantics!

Oct 12, 2024 • 8 tweets • 4 min read

🚀Excited to introduce our recent work @ AppleMLR --
DART: Denoising AutoRegressive Transformer for Scalable Text-to-Image Generation!
A transformer-based model that unifies Autoregressive and Diffusion with a non-Markovian diffusion framework:
🔗 (1/n)arxiv.org/abs/2410.08159

Diffusion model (DM) is limited by Markovian process where it only depends on the current input at each timestep. Unlike DM, DART leverages the full generative trajectory while retaining the progressive modeling benefits, leading to more efficient and flexible generation. (2/n)

Oct 24, 2023 • 7 tweets • 5 min read

📢 Introducing our latest research @Apple MLR for generating high-quality images & videos with a multi-resolution diffusion model -- Matryoshka Diffusion Models or MDM🪆, directly in pixel space (~1024px) without any VAEs or cascaded models. Code will be released soon! !(1/n)

https://twitter.com/_akhaliq/status/1716831652545208407

MDM is a single generative model that handles various high-resolution targets:
Images 🖼️
Text-to-Images 📜➡️🖼️
Text-to-Videos 📜➡️🎥
Distinct from existing works, MDM doesn't need a pre-trained VAE (e.g., SD) or training multiple upscaling modules (e.g., IMAGEN)(2/n)

Share this page!

Enter URL or ID to Unroll