Kimi Linear: A novel architecture that outperforms full attention with faster speeds and better performance—ready to serve as a drop-in replacement for full attention, featuring our open-sourced KDA kernels! Kimi Linear offers up to a 75% reduction in KV cache usage and up to 6x decoding throughput at a 1M context length.
Key highlights:
🔹 Kimi Delta Attention: A hardware-efficient linear attention mechanism that refines the gated delta rule.
🔹 Kimi Linear Architecture: The first hybrid linear architecture to surpass pure full attention quality across the board.
🔹 Empirical Validation: Scaled, fair comparisons + open-sourced KDA kernels, vLLM integration, and checkpoints.
The future of agentic-oriented attention is here! 💡
• • •
Missing some Tweet in this thread? You can try to
force a refresh
🚀 Hello, Kimi K2! Open-Source Agentic Model!
🔹 1T total / 32B active MoE model
🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models
🔹Strong in coding and agentic tasks
🐤 Multimodal & thought-mode not supported for now
With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can't wait to see what you build!
🔌 API is here: platform.moonshot.ai
- $0.15 / million input tokens (cache hit)
- $0.60 / million input tokens (cache miss)
- $2.50 / million output tokens
Join the discussion & share feedback in our Discord.👉
To facilitate more research efforts in the field, we are planning on open-sourcing the base pretrained model as well as the reinforcement-learned model underlying Kimi-Researcher in the following months.discord.gg/uGqNmXhNhM