🧵
Custom Diffusion only fine-tunes the cross-attention layers of the UNet and also supports blending textual inversion for seamless learning on consumer hardware.
As a result, with just 250 steps, we can get pretty good results depending on the underlying new subjects.
Mar 3, 2023 • 4 tweets • 2 min read
Today we're happy to announce the support of the **fastest** ControlNet from Diffusers 🧨 ⚡️
The integration was led by our amazing community member Takuma Mori 🤝
Cc: @lvminzhang@magrawala
We support all 8 types of conditioning officially supported by ControlNet. The `StableDiffusionControlNetPipeline` supports loading arbitrary `ControlNetModel`s to allow maximum flexibility.
An ex. of using Canny edge map as conditioning to generate 🤗 in different flavors:
Feb 22, 2023 • 6 tweets • 3 min read
I am playing with the `StableDiffusionInstructPix2PixPipeline` from Diffusers, and some observations I wanted to share 🧨
Let's say we want to edit the following image with the following edit instruction: "turn the clouds rainy"
(Doc: huggingface.co/docs/diffusers…)
With the default arguments of the pipeline, the result doesn't seem that coherent w.r.t edit instructions (verified across many different seeds and number of inference steps):
Feb 20, 2023 • 6 tweets • 3 min read
Diffusers 0.13.0 is 99.9% about pipelines focused on "controlled generation" 🎆 🎇
My favorite pipeline from the latest release is the `StableDiffusionPix2PixZeroPipeline` letting you have CycleGAN-like stuff w/o expensive training.
1. Going from a source concept to a target concept ("cat -> dog" as shown above) with an input prompt. 2. Option 2, but with the support of conditioning the pipeline on a real input image (see below)
Feb 3, 2023 • 4 tweets • 3 min read
Supporting Vision-Language Models from @huggingface 🔥
They're everywhere! They empower you to do stuff that seemed crazy before -- open-set vocab detection, phrase grounding, etc.
We cover a wide array of things (thanks to @alaradirik for doing an excellent job on that):
* How are they trained?
* What are typical fusion strategies?
* What datasets are typically used for training & evaluation?
* Finally, how YOU can use them from 🤗 Transformers.
2
Feb 2, 2023 • 8 tweets • 4 min read
Ever thought of fine-tuning Stable Diffusion using @TensorFlow and then using it in Diffusers? 🧨
Join in for a little wild ride 🤗
Once you have the fine-tuned weights in Keras, you use the tool (which is just a Space) to export a `StableDiffusionPipeline` and push it to the 🤗 Hub.
You can perform DreamBooth, Textual Inversion, Prompt-to-Prompt -- sky is the limit 🌈
Jan 30, 2023 • 4 tweets • 3 min read
Hugging Face ❤️ Computer Vision 👁🤗
We've been taking some giant leaps in our ecosystem dedicated to Computer Vision but it remains unclear what all we have to offer.
Our latest blog post provides you with a concrete overview of "State of CV at 🤗" ⤵️
From seminal models like ResNets to modern object detection models like OWL-ViT, we've got it all from your very own 🤗 Transformers (tutorial goodies included 🎁).
But our efforts are not limited to just 🤗 Transformers nor are we limited to models.
Dec 19, 2022 • 4 tweets • 3 min read
We now have full-fledged fine-tuning tutorial notebooks for semantic segmentation in both @TensorFlow and @PyTorch! Both use the SoTA SegFormer model 🤗
But there's more 🧵
We have a separate task page on segmentation to help you navigate across all the compatible datasets and models we have on the 🤗 Hub:
SegFormer (SoTA sem segmentation) has been available in @TensorFlow from 🤗 Transformers for some time now.
Today, we're releasing a polished version of a notebook for fine-tuning a SegFormer variant on a custom dataset.
Check it out 👉 colab.research.google.com/github/hugging…
The above results are with the smallest SegFormer variant and small-scale high-res dataset. Results improve with longer training and a slightly bigger variant.
The original notebook was co-authored by @algo_diver and myself.
Nov 11, 2022 • 5 tweets • 4 min read
This has been in the works for MONTHs now! Finally, it's in a good shape and is ready to be shipped 🚀
@algo_diver & I bring you an automated pipeline for semantic segmentation. It comes packed with lots of goodies, so please read on.
The pipeline has data ingestion (with support for Dataflow), schema generation, training, deployment, etc., all automatically.
Oct 31, 2022 • 4 tweets • 3 min read
Delighted to release the @TensorFlow port of MAXIM, a single backbone capable of denoising, dehazing, deblurring, and more! Comes packed w/ @huggingface Spaces, Colab, and XLA support!
1/
This has been in the making for some days now, and I am really glad it’s now available in TensorFlow. All converted TensorFlow MAXIM models are on @huggingface hub:
Done w/ @algo_diver
1/
We're quite happy with how it turned out. When we started this series, there was a dearth of resources showing how to deploy TF models from Transformers to GCP and TF ecosystem following good practices. We wanted to close that gap 🤗
2/
Jul 22, 2022 • 4 tweets • 2 min read
Performant and efficient segmentation models (SegFormer) in TF just landed in @huggingface.
It yields impressive performance while achieving efficiency. In particular, its performance on higher-resolution images (512x512, 640x640, etc.) is commendable.
There's no point in comparing the outputs of the DALL-E playground from @OpenAI and the samples provided by Imagen. We don't have that ground yet.
But ever since I got access to DALL-E I am simply taking captions from Imagen and supplying them to the DALL-E prompt. Results? ⬇️
"A marble statue of a Koala DJ in front of a marble statue of a turntable. The Koala has wearing large marble headphones."
May 25, 2022 • 10 tweets • 5 min read
Everyone has their own preferences for courses and books for learning ML/DL.
In this thread, I'll share mine. The lists _won't be_ ordered and will be categorized in various themes.
*I'm starting w/ materials that are aimed at providing a general understanding of ML/DL.*
1/
I like content that doesn't rely too much on specific frameworks but instead uses them to realize the concepts. So, my suggestions will be biased toward that.
2.1/
Apr 18, 2022 • 5 tweets • 4 min read
What do the Vision Transformers learn? How do they encode anything useful for image recognition? In our latest work, we reimplement a number of works done in this area & investigate various ViT model families (DeiT, DINO, original, etc.).