Exciting updates to #stablediffusion with Core ML!
- 6-bit weight compression that yields just under 1 GB
- Up to 30% improved Neural Engine performance
- New benchmarks on iPhone, iPad and Macs
- Multilingual system text encoder support
- ControlNet github.com/apple/ml-stabl… 🧵
coremltools-7.0 introduced advanced model compression techniques. For Stable Diffusion, we demonstrate how 6-bit post-training palettization yields faster models that consume 63% less memory compared to float16. Output variance is comparable to GPU vs Neural Engine.
This 25-minute WWDC23 session is the best resource to learn more about model compression for Apple Silicon: developer.apple.com/videos/play/ww…. We only demonstrate post-training palettization for Stable Diffusion. For better results, check out training-time palettization for 2- and 4-bits!
Improvements to the attention implementation lead to 10-30% performance improvement on the Neural Engine pushing iPhone 14 Pro performance to under 10 seconds without architecture compression or step-distillation.
These improvements are complementary to architecture compression and time-distillation techniques for diffusion models. For example, this improves the baseline in SnapFusion from @Snap by ~4x which will improve the paper's results by a related factor. arxiv.org/pdf/2306.00980…
Finally, this WWDC 23 session introduced native multilingual text embeddings through a Transformer-based model: developer.apple.com/videos/play/ww…. We share code so developers can benefit from the multilingual image generation demo workflow.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Today's release of macOS Ventura 13.1 Beta 4 and iOS and iPadOS 16.2 Beta 4 include optimizations that let Stable Diffusion run with improved efficiency on the Apple Neural Engine as well as on Apple Silicon GPU
We share sample code for model conversion from PyTorch to Core ML and have example Python pipelines for text-to-image using Core ML models run with coremltools and diffusers