Songlin Yang Profile picture
Ph.D. student @MIT_CSAIL. Working on scalable and principled methods in #ML & #LLM. she/her/hers. In 🐳 and open-sourcing I trust
Jun 11 11 tweets 3 min read
Flash Linear Attention () will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:github.com/fla-org/flash-… (1/N) The RWKV author has repeatedly labeled FLA’s implementation as “buggy,” yet refused to help fix the initialization or offer constructive feedback. We’ve invested serious effort improving precision and faithfully reproducing their setup—while their own repo remains unreadable, unpopular, and poorly maintained.
May 24 16 tweets 7 min read
📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks
arxiv.org/abs/2505.16381 ⚙️ (2/16) Triton implementation is available at
github.com/fla-org/flash-…