vik Profile picture
vik
teaching computers how to see @moondreamai // prev: @awscloud
May 2 8 tweets 3 min read
how we implemented Moondream inference on Apple Silicon (spoiler: we don't use MLX)

⬇️ (1/N) Photon, our inference engine, isn't fast just because of GPU kernels. A lot of the speedup comes from engine-level work: request scheduling, prefix caching, image processing, all tuned to keep the GPU saturated. moondream.ai/p/photon
Apr 30, 2025 4 tweets 1 min read
This paper's appendices could've been turned into several high-quality standalone papers. Image improved way to do EMA? ✅ Image