Latest Twitter Threads by @vikhyatk on Thread Reader App

May 2 • 8 tweets • 3 min read

how we implemented Moondream inference on Apple Silicon (spoiler: we don't use MLX)

⬇️ (1/N)

https://twitter.com/mayfer/status/2050323883950313980

Photon, our inference engine, isn't fast just because of GPU kernels. A lot of the speedup comes from engine-level work: request scheduling, prefix caching, image processing, all tuned to keep the GPU saturated. moondream.ai/p/photon

Apr 30, 2025 • 4 tweets • 1 min read

This paper's appendices could've been turned into several high-quality standalone papers.

improved way to do EMA? ✅

Share this page!

Enter URL or ID to Unroll