Photon, our inference engine, isn't fast just because of GPU kernels. A lot of the speedup comes from engine-level work: request scheduling, prefix caching, image processing, all tuned to keep the GPU saturated. moondream.ai/p/photon
Apr 30, 2025 • 4 tweets • 1 min read
This paper's appendices could've been turned into several high-quality standalone papers.
improved way to do EMA? ✅