Building full-stack generative AI for population scale. Experience Sarvam at https://t.co/pxOFDoSBYj
Feb 19 • 4 tweets • 2 min read
Yesterday, we released Sarvam 30B and Sarvam 105B. Built from scratch, both models leverage a Mixture of Experts (MoE) architecture, delivering stronger performance at scale while using compute more efficiently.
Sarvam 30B activates just 1B non-embedded parameters per token, so it runs far more efficiently while maintaining strong capability.
The model was pretrained on 16 trillion tokens spanning code, web, multilingual, and mathematical data, and supports a 32K context window that enables long-running agentic interactions.
It is ideal for real-time applications like conversational AI and high-throughput workflows where latency matters.