Ph.D. student @MIT_CSAIL. Working on scalable and principled methods in #ML & #LLM. she/her/hers. In 🐳 and open-sourcing I trust
Jun 11 • 11 tweets • 3 min read
Flash Linear Attention () will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:github.com/fla-org/flash-…
(1/N) The RWKV author has repeatedly labeled FLA’s implementation as “buggy,” yet refused to help fix the initialization or offer constructive feedback. We’ve invested serious effort improving precision and faithfully reproducing their setup—while their own repo remains unreadable, unpopular, and poorly maintained.
May 24 • 16 tweets • 7 min read
📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381
⚙️ (2/16) Triton implementation is available at github.com/fla-org/flash-…