Latest Twitter Threads by @realDanFu on Thread Reader App

Feb 8, 2024 • 6 tweets • 2 min read

ChatGPT's 1700-token system prompt got you down?

Led by @jordanjuravsky, @brad19brown, introducing Hydragen, a simple technique for Transformer LLM inference with shared prefixes! Up to 30x improvement in throughput with no custom CUDA!

A few things I love in this project: 1/

The idea is pretty simple. You can use the softmax scaling trick to split up the prefix and suffix into different attention calls - and batch attention queries over the shared prefixes.

This reduces your IO and changes GEMV calls into GEMM calls for higher FLOP util. 2/

Mar 28, 2023 • 6 tweets • 4 min read

This sentiment is exactly right - and why we've been working to increase sequence length in our lab for the past two years!

From FlashAttention, to S4, H3, Hyena, and more - check out our blog post putting this line of work into context: hazyresearch.stanford.edu/blog/2023-03-2…

More below: 1/n

https://twitter.com/sama/status/1639765085848752128

The context lengths of foundation models have grown exponentially recently - exciting developments!

We've been happy to play a small role with FlashAttention, and we're very excited about the possibilities: multiple media sources, complex demonstrations, and more! 2/n

Jan 23, 2023 • 18 tweets • 8 min read

Attention is all you need... but how much of it do you need?

Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao

📜 arxiv.org/abs/2212.14052 1/n One key point: SSMs are *linear* in sequence length instead of quadratic, and have no fixed context length. Long context for everyone!

We're super excited, so we're releasing our code and model weights today - up to 2.7B parameters!

github.com/HazyResearch/H3 2/n

Jan 10, 2022 • 7 tweets • 2 min read

The Stanford MLSys Seminar is now available in podcast form on Apple Podcasts, Spotify, Google, and more!

We release new podcasts every Monday and Friday (new episodes on Fridays, old episodes from the backlog on Mondays).

Check us out on your favorite platform below! (1/n) (2/n) Apple Podcasts: podcasts.apple.com/us/podcast/sta…

Share this page!

Enter URL or ID to Unroll