Sayak Paul Profile picture
ML at @huggingface 🤗
Dec 16 5 tweets 2 min read
Add structural control to Flux!

We're excited to release exp. version of Flux Control fine-tuning scripts.

Flux Control from @BlackForestLabs is, by far, the strongest alternative to ControlNets while being computationally far more efficient. Image The idea of Flux Control is simple yet elegant:

1. Instead of maintaining a separate auxiliary module like ControlNet or T2I Adapter, increase the number of input channels for the image latents in the pretrained Flux DiT.

2. Compute the latents of the structural inputs (depth map, for example) with the VAE and concatenate it with the actual latents you started the denoising process with.

3. During training, only the original image latents are noised and the structural latents are then concatenated to it before it's fed to the denoiser.

4. We start from the pretrained T2I Flux DiT along with the additional channels and train further on a similar ControlNet-like dataset!
Nov 6, 2023 8 tweets 4 min read
Long time no release 👀

Well, let's break the silence and allow me to present 🧨 diffusers 0.22.0 🔥

Three new pipelines, 🤗 PEFT integration, new functionalities, and QoL improvements 🔋🏆

Join in 🧵 to know more!

1/8 Visualize the moment of liberation as dawn breaks: a group stands on a hill, breaking free from literal chains around their wrists, which disintegrate into glowing particles. The rising sun bathes them in warm light, symbolizing a new era of freedom and expression, while a phoenix soars overhead, embodying rebirth and hope. We're bringing you the core support for Latent Consistency Models (both T2I & I2I are supported) 🔥

LCMs are darn fast! 4-8 steps are enough for plausible images. With a tiny autoencoder, you can squeeze in the max 🏎️ gains.

Doc ⬇️


2/8 huggingface.co/docs/diffusers…
Image
Sep 27, 2023 5 tweets 2 min read
A 🧵 on the officially supported training examples of 🧨 diffusers 🤯

* Vanilla fine-tuning
* DreamBooth
* InstructPix2Pix
* ControlNet
* T2I-Adapters
* Custom Diffusion
* Unconditional generation

Check'em here ⬇️


1/5github.com/huggingface/di… Our training examples are educational, meaning we often compromise efficiency & comprehensiveness for readability.

Also, we try to make them as hardware-accessible as possible.

E.g., you can LoRA DreamBooth an SDXL on a free-tier @GoogleColab 🤗



2/5colab.research.google.com/github/hugging…
Aug 16, 2023 4 tweets 2 min read
Now generate "Trumppenheimer" but fassst 🏎⚡️

Presenting a series of SDXL ControlNet checkpoints that are 5 to 7x smaller and faster 🧨🤗

Led by the one and only @psuraj28 🔥

Join in the 🧵 to know more!

1/ Image We are releasing a total of 4 small SDXL ControlNet checkpoints today - 2 for Canny and 2 for Depth 💣

Find the figure below that gives a CUMULATIVE rundown of the savings on memory and inference latency (A10 GPU) 📊

Find the benchmarking script ⬇️


2/ https://t.co/lJShSIFmOUgist.github.com/sayakpaul/0211…
Image
Jul 27, 2023 5 tweets 3 min read
🧨 diffusers 0.19.0 is out and comes with the latest SDXL 1.0 🔥

1️⃣ New training scripts for SDXL
2️⃣ New pipelines for SDXL (ControlNet, InstructPix2Pix, Inpainting, etc.)
3️⃣ AutoPipeline
and MORE!

Release notes 📝


1/5 https://t.co/SpRrq0yonXgithub.com/huggingface/di…

Image
Image
SDXL 1.0 comes with permissive licensing. Additional pipelines for SDXL 🚀

* Inpainting
* Image-to-Image
* ControlNet
* InstructPix2Pix

We also provide support for using an ensemble of expert denoisers 🪄

Docs ⬇️


2/5 https://t.co/zTHFRJsTvShuggingface.co/docs/diffusers…
Image
Jun 8, 2023 7 tweets 4 min read
🧨 diffusers 0.17.0 is out and comes with new pipelines, improved LoRA support, `torch.compile()` speedups, and more ⏰

🪄 UniDiffuser
🦄 DiffEdit
⚡️ IF DreamBooth
💡 Support for A1111 LoRA
and more ...

Release notes 📝
github.com/huggingface/di…

1/🧶 Image First, we have another cool pipeline, namely UniDiffuser, capable of performing **SIX different tasks** 🤯

It's the first multimodal pipeline in 🧨 diffusers.

Thanks to `dg845` for contributing this!

Docs ⬇️
huggingface.co/docs/diffusers…

2/🧶
Apr 20, 2023 5 tweets 3 min read
Multi-concept subject training is now supported in 🧨 diffusers through "Custom Diffusion".

Thanks to Nupur (author of Custom Diffusion) for working hard on the integration!

Cat and wooden pot -- two concepts blending in the image below 🐱🪵

Docs ⬇️ huggingface.co/docs/diffusers…

🧵 the <new1> cat sculpture in... Custom Diffusion only fine-tunes the cross-attention layers of the UNet and also supports blending textual inversion for seamless learning on consumer hardware.

As a result, with just 250 steps, we can get pretty good results depending on the underlying new subjects. https://www.cs.cmu.edu/~cus...Image
Mar 3, 2023 4 tweets 2 min read
Today we're happy to announce the support of the **fastest** ControlNet from Diffusers 🧨 ⚡️

The integration was led by our amazing community member Takuma Mori 🤝

Check out our blog post here to catch all the details:
huggingface.co/blog/controlnet

Cc: @lvminzhang @magrawala We support all 8 types of conditioning officially supported by ControlNet. The `StableDiffusionControlNetPipeline` supports loading arbitrary `ControlNetModel`s to allow maximum flexibility.

An ex. of using Canny edge map as conditioning to generate 🤗 in different flavors: Image
Feb 22, 2023 6 tweets 3 min read
I am playing with the `StableDiffusionInstructPix2PixPipeline` from Diffusers, and some observations I wanted to share 🧨

Let's say we want to edit the following image with the following edit instruction: "turn the clouds rainy"

(Doc: huggingface.co/docs/diffusers…) With the default arguments of the pipeline, the result doesn't seem that coherent w.r.t edit instructions (verified across many different seeds and number of inference steps):
Feb 20, 2023 6 tweets 3 min read
Diffusers 0.13.0 is 99.9% about pipelines focused on "controlled generation" 🎆 🎇

My favorite pipeline from the latest release is the `StableDiffusionPix2PixZeroPipeline` letting you have CycleGAN-like stuff w/o expensive training.

Docs: huggingface.co/docs/diffusers…

🧵 The pipeline has two options:

1. Going from a source concept to a target concept ("cat -> dog" as shown above) with an input prompt.
2. Option 2, but with the support of conditioning the pipeline on a real input image (see below)
Feb 3, 2023 4 tweets 3 min read
Supporting Vision-Language Models from @huggingface 🔥

They're everywhere! They empower you to do stuff that seemed crazy before -- open-set vocab detection, phrase grounding, etc.

Read our latest blog to learn more about these models:
huggingface.co/blog/vision_la…

1/🧵 We cover a wide array of things (thanks to @alaradirik for doing an excellent job on that):

* How are they trained?
* What are typical fusion strategies?
* What datasets are typically used for training & evaluation?
* Finally, how YOU can use them from 🤗 Transformers.

2
Feb 2, 2023 8 tweets 4 min read
Ever thought of fine-tuning Stable Diffusion using @TensorFlow and then using it in Diffusers? 🧨

Presenting a tool for KerasCV => Diffusers converter 🔥
huggingface.co/docs/diffusers…

Join in for a little wild ride 🤗 a llama holding a poster saying "TensorFlow loves diffu Once you have the fine-tuned weights in Keras, you use the tool (which is just a Space) to export a `StableDiffusionPipeline` and push it to the 🤗 Hub.

You can perform DreamBooth, Textual Inversion, Prompt-to-Prompt -- sky is the limit 🌈
Jan 30, 2023 4 tweets 3 min read
Hugging Face ❤️ Computer Vision 👁🤗

We've been taking some giant leaps in our ecosystem dedicated to Computer Vision but it remains unclear what all we have to offer.

Our latest blog post provides you with a concrete overview of "State of CV at 🤗" ⤵️

huggingface.co/blog/cv_state ImageImage There's a LOT in the store.

From seminal models like ResNets to modern object detection models like OWL-ViT, we've got it all from your very own 🤗 Transformers (tutorial goodies included 🎁).

But our efforts are not limited to just 🤗 Transformers nor are we limited to models.
Dec 19, 2022 4 tweets 3 min read
We now have full-fledged fine-tuning tutorial notebooks for semantic segmentation in both @TensorFlow and @PyTorch! Both use the SoTA SegFormer model 🤗

Simply navigate to 🤗 Notebooks and search for "semantic segmentation" 👉 github.com/huggingface/no…

But there's more 🧵 We have a separate task page on segmentation to help you navigate across all the compatible datasets and models we have on the 🤗 Hub:

huggingface.co/tasks/image-se…
Dec 2, 2022 4 tweets 2 min read
SegFormer (SoTA sem segmentation) has been available in @TensorFlow from 🤗 Transformers for some time now.

Today, we're releasing a polished version of a notebook for fine-tuning a SegFormer variant on a custom dataset.

Check it out 👉 colab.research.google.com/github/hugging… Image The above results are with the smallest SegFormer variant and small-scale high-res dataset. Results improve with longer training and a slightly bigger variant.

The original notebook was co-authored by @algo_diver and myself.
Nov 11, 2022 5 tweets 4 min read
This has been in the works for MONTHs now! Finally, it's in a good shape and is ready to be shipped 🚀

@algo_diver & I bring you an automated pipeline for semantic segmentation. It comes packed with lots of goodies, so please read on.

@TensorFlow @huggingface @googlecloud Image We implemented everything in a TFX pipeline and delegated its execution to Vertex AI on GCP.

The pipeline has data ingestion (with support for Dataflow), schema generation, training, deployment, etc., all automatically.
Oct 31, 2022 4 tweets 3 min read
Delighted to release the @TensorFlow port of MAXIM, a single backbone capable of denoising, dehazing, deblurring, and more! Comes packed w/ @huggingface Spaces, Colab, and XLA support!

Know all of it here:
github.com/sayakpaul/maxi…

Cc: @_vztu, @Han_Zhang_

1/ This has been in the making for some days now, and I am really glad it’s now available in TensorFlow. All converted TensorFlow MAXIM models are on @huggingface hub:

huggingface.co/models?pipelin…

2/
Aug 22, 2022 6 tweets 4 min read
The final post in the series on deploying @TensorFlow vision models from @huggingface is up!

You'll learn to deploy a ViT B/16 using Vertex AI with many firepowers, including autoscaling, traffic splitting, etc.

huggingface.co/blog/deploy-ve…

Done w/ @algo_diver

1/
We're quite happy with how it turned out. When we started this series, there was a dearth of resources showing how to deploy TF models from Transformers to GCP and TF ecosystem following good practices. We wanted to close that gap 🤗

2/
Jul 22, 2022 4 tweets 2 min read
Performant and efficient segmentation models (SegFormer) in TF just landed in @huggingface.

It yields impressive performance while achieving efficiency. In particular, its performance on higher-resolution images (512x512, 640x640, etc.) is commendable.

🧵 * Model documentation: huggingface.co/docs/transform…
* Interactive space for playing with the model: huggingface.co/spaces/chansun…
* _Fine-tune_ on your own segmentation dataset: github.com/deep-diver/seg…
Jun 17, 2022 6 tweets 2 min read
There's no point in comparing the outputs of the DALL-E playground from @OpenAI and the samples provided by Imagen. We don't have that ground yet.

But ever since I got access to DALL-E I am simply taking captions from Imagen and supplying them to the DALL-E prompt. Results? ⬇️ "A marble statue of a Koala DJ in front of a marble statue of a turntable. The Koala has wearing large marble headphones."
May 25, 2022 10 tweets 5 min read
Everyone has their own preferences for courses and books for learning ML/DL.

In this thread, I'll share mine. The lists _won't be_ ordered and will be categorized in various themes.

*I'm starting w/ materials that are aimed at providing a general understanding of ML/DL.*

1/ I like content that doesn't rely too much on specific frameworks but instead uses them to realize the concepts. So, my suggestions will be biased toward that.

2.1/