Sayak Paul Profile picture
ML at @huggingface 🤗
Nov 6, 2023 8 tweets 4 min read
Long time no release 👀

Well, let's break the silence and allow me to present 🧨 diffusers 0.22.0 🔥

Three new pipelines, 🤗 PEFT integration, new functionalities, and QoL improvements 🔋🏆

Join in 🧵 to know more!

1/8 Visualize the moment of liberation as dawn breaks: a group stands on a hill, breaking free from literal chains around their wrists, which disintegrate into glowing particles. The rising sun bathes them in warm light, symbolizing a new era of freedom and expression, while a phoenix soars overhead, embodying rebirth and hope. We're bringing you the core support for Latent Consistency Models (both T2I & I2I are supported) 🔥

LCMs are darn fast! 4-8 steps are enough for plausible images. With a tiny autoencoder, you can squeeze in the max 🏎️ gains.

Doc ⬇️


2/8 huggingface.co/docs/diffusers…
Image
Sep 27, 2023 5 tweets 2 min read
A 🧵 on the officially supported training examples of 🧨 diffusers 🤯

* Vanilla fine-tuning
* DreamBooth
* InstructPix2Pix
* ControlNet
* T2I-Adapters
* Custom Diffusion
* Unconditional generation

Check'em here ⬇️


1/5github.com/huggingface/di… Our training examples are educational, meaning we often compromise efficiency & comprehensiveness for readability.

Also, we try to make them as hardware-accessible as possible.

E.g., you can LoRA DreamBooth an SDXL on a free-tier @GoogleColab 🤗



2/5colab.research.google.com/github/hugging…
Aug 16, 2023 4 tweets 2 min read
Now generate "Trumppenheimer" but fassst 🏎⚡️

Presenting a series of SDXL ControlNet checkpoints that are 5 to 7x smaller and faster 🧨🤗

Led by the one and only @psuraj28 🔥

Join in the 🧵 to know more!

1/ Image We are releasing a total of 4 small SDXL ControlNet checkpoints today - 2 for Canny and 2 for Depth 💣

Find the figure below that gives a CUMULATIVE rundown of the savings on memory and inference latency (A10 GPU) 📊

Find the benchmarking script ⬇️


2/ https://t.co/lJShSIFmOUgist.github.com/sayakpaul/0211…
Image
Jul 27, 2023 5 tweets 3 min read
🧨 diffusers 0.19.0 is out and comes with the latest SDXL 1.0 🔥

1️⃣ New training scripts for SDXL
2️⃣ New pipelines for SDXL (ControlNet, InstructPix2Pix, Inpainting, etc.)
3️⃣ AutoPipeline
and MORE!

Release notes 📝


1/5 https://t.co/SpRrq0yonXgithub.com/huggingface/di…

Image
Image
SDXL 1.0 comes with permissive licensing. Additional pipelines for SDXL 🚀

* Inpainting
* Image-to-Image
* ControlNet
* InstructPix2Pix

We also provide support for using an ensemble of expert denoisers 🪄

Docs ⬇️


2/5 https://t.co/zTHFRJsTvShuggingface.co/docs/diffusers…
Image
Jun 8, 2023 7 tweets 4 min read
🧨 diffusers 0.17.0 is out and comes with new pipelines, improved LoRA support, `torch.compile()` speedups, and more ⏰

🪄 UniDiffuser
🦄 DiffEdit
⚡️ IF DreamBooth
💡 Support for A1111 LoRA
and more ...

Release notes 📝
github.com/huggingface/di…

1/🧶 Image First, we have another cool pipeline, namely UniDiffuser, capable of performing **SIX different tasks** 🤯

It's the first multimodal pipeline in 🧨 diffusers.

Thanks to `dg845` for contributing this!

Docs ⬇️
huggingface.co/docs/diffusers…

2/🧶
Apr 20, 2023 5 tweets 3 min read
Multi-concept subject training is now supported in 🧨 diffusers through "Custom Diffusion".

Thanks to Nupur (author of Custom Diffusion) for working hard on the integration!

Cat and wooden pot -- two concepts blending in the image below 🐱🪵

Docs ⬇️ huggingface.co/docs/diffusers…

🧵 the <new1> cat sculpture in... Custom Diffusion only fine-tunes the cross-attention layers of the UNet and also supports blending textual inversion for seamless learning on consumer hardware.

As a result, with just 250 steps, we can get pretty good results depending on the underlying new subjects. https://www.cs.cmu.edu/~cus...Image
Mar 3, 2023 4 tweets 2 min read
Today we're happy to announce the support of the **fastest** ControlNet from Diffusers 🧨 ⚡️

The integration was led by our amazing community member Takuma Mori 🤝

Check out our blog post here to catch all the details:
huggingface.co/blog/controlnet

Cc: @lvminzhang @magrawala We support all 8 types of conditioning officially supported by ControlNet. The `StableDiffusionControlNetPipeline` supports loading arbitrary `ControlNetModel`s to allow maximum flexibility.

An ex. of using Canny edge map as conditioning to generate 🤗 in different flavors: Image
Feb 22, 2023 6 tweets 3 min read
I am playing with the `StableDiffusionInstructPix2PixPipeline` from Diffusers, and some observations I wanted to share 🧨

Let's say we want to edit the following image with the following edit instruction: "turn the clouds rainy"

(Doc: huggingface.co/docs/diffusers…) With the default arguments of the pipeline, the result doesn't seem that coherent w.r.t edit instructions (verified across many different seeds and number of inference steps):
Feb 20, 2023 6 tweets 3 min read
Diffusers 0.13.0 is 99.9% about pipelines focused on "controlled generation" 🎆 🎇

My favorite pipeline from the latest release is the `StableDiffusionPix2PixZeroPipeline` letting you have CycleGAN-like stuff w/o expensive training.

Docs: huggingface.co/docs/diffusers…

🧵 The pipeline has two options:

1. Going from a source concept to a target concept ("cat -> dog" as shown above) with an input prompt.
2. Option 2, but with the support of conditioning the pipeline on a real input image (see below)
Feb 3, 2023 4 tweets 3 min read
Supporting Vision-Language Models from @huggingface 🔥

They're everywhere! They empower you to do stuff that seemed crazy before -- open-set vocab detection, phrase grounding, etc.

Read our latest blog to learn more about these models:
huggingface.co/blog/vision_la…

1/🧵 We cover a wide array of things (thanks to @alaradirik for doing an excellent job on that):

* How are they trained?
* What are typical fusion strategies?
* What datasets are typically used for training & evaluation?
* Finally, how YOU can use them from 🤗 Transformers.

2
Feb 2, 2023 8 tweets 4 min read
Ever thought of fine-tuning Stable Diffusion using @TensorFlow and then using it in Diffusers? 🧨

Presenting a tool for KerasCV => Diffusers converter 🔥
huggingface.co/docs/diffusers…

Join in for a little wild ride 🤗 a llama holding a poster saying "TensorFlow loves diffu Once you have the fine-tuned weights in Keras, you use the tool (which is just a Space) to export a `StableDiffusionPipeline` and push it to the 🤗 Hub.

You can perform DreamBooth, Textual Inversion, Prompt-to-Prompt -- sky is the limit 🌈
Jan 30, 2023 4 tweets 3 min read
Hugging Face ❤️ Computer Vision 👁🤗

We've been taking some giant leaps in our ecosystem dedicated to Computer Vision but it remains unclear what all we have to offer.

Our latest blog post provides you with a concrete overview of "State of CV at 🤗" ⤵️

huggingface.co/blog/cv_state ImageImage There's a LOT in the store.

From seminal models like ResNets to modern object detection models like OWL-ViT, we've got it all from your very own 🤗 Transformers (tutorial goodies included 🎁).

But our efforts are not limited to just 🤗 Transformers nor are we limited to models.
Dec 19, 2022 4 tweets 3 min read
We now have full-fledged fine-tuning tutorial notebooks for semantic segmentation in both @TensorFlow and @PyTorch! Both use the SoTA SegFormer model 🤗

Simply navigate to 🤗 Notebooks and search for "semantic segmentation" 👉 github.com/huggingface/no…

But there's more 🧵 We have a separate task page on segmentation to help you navigate across all the compatible datasets and models we have on the 🤗 Hub:

huggingface.co/tasks/image-se…
Dec 2, 2022 4 tweets 2 min read
SegFormer (SoTA sem segmentation) has been available in @TensorFlow from 🤗 Transformers for some time now.

Today, we're releasing a polished version of a notebook for fine-tuning a SegFormer variant on a custom dataset.

Check it out 👉 colab.research.google.com/github/hugging… Image The above results are with the smallest SegFormer variant and small-scale high-res dataset. Results improve with longer training and a slightly bigger variant.

The original notebook was co-authored by @algo_diver and myself.
Nov 11, 2022 5 tweets 4 min read
This has been in the works for MONTHs now! Finally, it's in a good shape and is ready to be shipped 🚀

@algo_diver & I bring you an automated pipeline for semantic segmentation. It comes packed with lots of goodies, so please read on.

@TensorFlow @huggingface @googlecloud Image We implemented everything in a TFX pipeline and delegated its execution to Vertex AI on GCP.

The pipeline has data ingestion (with support for Dataflow), schema generation, training, deployment, etc., all automatically.
Oct 31, 2022 4 tweets 3 min read
Delighted to release the @TensorFlow port of MAXIM, a single backbone capable of denoising, dehazing, deblurring, and more! Comes packed w/ @huggingface Spaces, Colab, and XLA support!

Know all of it here:
github.com/sayakpaul/maxi…

Cc: @_vztu, @Han_Zhang_

1/ This has been in the making for some days now, and I am really glad it’s now available in TensorFlow. All converted TensorFlow MAXIM models are on @huggingface hub:

huggingface.co/models?pipelin…

2/
Aug 22, 2022 6 tweets 4 min read
The final post in the series on deploying @TensorFlow vision models from @huggingface is up!

You'll learn to deploy a ViT B/16 using Vertex AI with many firepowers, including autoscaling, traffic splitting, etc.

huggingface.co/blog/deploy-ve…

Done w/ @algo_diver

1/
We're quite happy with how it turned out. When we started this series, there was a dearth of resources showing how to deploy TF models from Transformers to GCP and TF ecosystem following good practices. We wanted to close that gap 🤗

2/
Jul 22, 2022 4 tweets 2 min read
Performant and efficient segmentation models (SegFormer) in TF just landed in @huggingface.

It yields impressive performance while achieving efficiency. In particular, its performance on higher-resolution images (512x512, 640x640, etc.) is commendable.

🧵 * Model documentation: huggingface.co/docs/transform…
* Interactive space for playing with the model: huggingface.co/spaces/chansun…
* _Fine-tune_ on your own segmentation dataset: github.com/deep-diver/seg…
Jun 17, 2022 6 tweets 2 min read
There's no point in comparing the outputs of the DALL-E playground from @OpenAI and the samples provided by Imagen. We don't have that ground yet.

But ever since I got access to DALL-E I am simply taking captions from Imagen and supplying them to the DALL-E prompt. Results? ⬇️ "A marble statue of a Koala DJ in front of a marble statue of a turntable. The Koala has wearing large marble headphones."
May 25, 2022 10 tweets 5 min read
Everyone has their own preferences for courses and books for learning ML/DL.

In this thread, I'll share mine. The lists _won't be_ ordered and will be categorized in various themes.

*I'm starting w/ materials that are aimed at providing a general understanding of ML/DL.*

1/ I like content that doesn't rely too much on specific frameworks but instead uses them to realize the concepts. So, my suggestions will be biased toward that.

2.1/
Apr 18, 2022 5 tweets 4 min read
What do the Vision Transformers learn? How do they encode anything useful for image recognition? In our latest work, we reimplement a number of works done in this area & investigate various ViT model families (DeiT, DINO, original, etc.).

Done w/ @ariG23498

1/ We also reimplemented different models in #Keras. These were first populated w/ pre-trained parameters & were then evaluated to ensure correctness.

Code, models, a tutorial, interactive demos (w/ @huggingface Spaces), visuals:

github.com/sayakpaul/prob…

2/