Latest Twitter Threads by @iofu728 on Thread Reader App

Jul 7, 2024 • 7 tweets • 3 min read

Thanks @_akhaliq for sponsoring. You can now use MInference online in HF Demo with ZeroGPU.
Now, you can process 1M context 10x faster in a single A100 using Long-context LLMs like LLaMA3-1M, GLM4-1M, with even better acc, try MInference 1.0 right now! huggingface.co/spaces/microso…

https://twitter.com/_akhaliq/status/1809955332178706899

1) The motivation behind MInference is long-context inference is highly resource-intensive, yet inherently sparse and dynamic. We use dynamic sparse attention to accelerate the pre-filling of 1M inference by up to 10x. For more details, visit project page: aka.ms/MInference

Share this page!

Enter URL or ID to Unroll