Intern @Google, Ph.D. Student @Cornell_CS, interested in machine learning.
Jan 13 • 12 tweets • 6 min read
Attention has been the key component for most advances in LLMs, but it can’t scale to long context. Does this mean we need to find an alternative?
Presenting Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time. Titans are more effective than Transformers and modern linear RNNs, and can effectively scale to larger than 2M context window, with better performance than ultra-large models (e.g., GPT4, Llama3-80B).
@mirrokni Paper:
Next, I'll discuss our intuition for designing Titans, how we implement them efficiently, and experimental results (i.e., how they perform in different tasks).arxiv.org/pdf/2501.00663…