Latest Twitter Threads by @SonglinYang4 on Thread Reader App

Jun 11 • 11 tweets • 3 min read

Flash Linear Attention () will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:github.com/fla-org/flash-… (1/N) The RWKV author has repeatedly labeled FLA’s implementation as “buggy,” yet refused to help fix the initialization or offer constructive feedback. We’ve invested serious effort improving precision and faithfully reproducing their setup—while their own repo remains unreadable, unpopular, and poorly maintained.

May 24 • 16 tweets • 7 min read

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks
arxiv.org/abs/2505.16381 ⚙️ (2/16) Triton implementation is available at
github.com/fla-org/flash-…

Share this page!

Enter URL or ID to Unroll