Dan Fu Profile picture
CS PhD Candidate/Researcher at Stanford. Systems for machine learning. Sometimes YouTuber/podcaster.
Jerome Ku Profile picture 1 subscribed
Feb 8 6 tweets 2 min read
ChatGPT's 1700-token system prompt got you down?

Led by @jordanjuravsky, @brad19brown, introducing Hydragen, a simple technique for Transformer LLM inference with shared prefixes! Up to 30x improvement in throughput with no custom CUDA!

A few things I love in this project: 1/Image The idea is pretty simple. You can use the softmax scaling trick to split up the prefix and suffix into different attention calls - and batch attention queries over the shared prefixes.

This reduces your IO and changes GEMV calls into GEMM calls for higher FLOP util. 2/
Mar 28, 2023 6 tweets 4 min read
This sentiment is exactly right - and why we've been working to increase sequence length in our lab for the past two years!

From FlashAttention, to S4, H3, Hyena, and more - check out our blog post putting this line of work into context: hazyresearch.stanford.edu/blog/2023-03-2…

More below: 1/n The context lengths of foundation models have grown exponentially recently - exciting developments!

We've been happy to play a small role with FlashAttention, and we're very excited about the possibilities: multiple media sources, complex demonstrations, and more! 2/n
Jan 23, 2023 18 tweets 8 min read
Attention is all you need... but how much of it do you need?

Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao

📜 arxiv.org/abs/2212.14052 1/n One key point: SSMs are *linear* in sequence length instead of quadratic, and have no fixed context length. Long context for everyone!

We're super excited, so we're releasing our code and model weights today - up to 2.7B parameters!

github.com/HazyResearch/H3 2/n
Jan 10, 2022 7 tweets 2 min read
The Stanford MLSys Seminar is now available in podcast form on Apple Podcasts, Spotify, Google, and more!

We release new podcasts every Monday and Friday (new episodes on Fridays, old episodes from the backlog on Mondays).

Check us out on your favorite platform below! (1/n) (2/n) Apple Podcasts: podcasts.apple.com/us/podcast/sta…