Suraj Patil Profile picture
Oct 18 11 tweets 5 min read
#Dreambooth is a method to teach new concepts to #stablediffusion , we have a super simple script to train dreambooth in 🧨diffusers. But our users reported that the results weren't as good as other Compvis forks. So we dug deep and found out some cool tricks.
A 🧵
Training the text encoder along with the unet gives the best results in terms of image-text alignment and prompt composition.

Left image - frozen text encoder
Right Image - finetuned text encoder

The results are drastically improved :🤯 ImageImage
We updated our script to allow fine-tuning text encoder github.com/huggingface/di…
FInd the right combination of LR and training steps for your training data.

Low LR and too few training steps -> Underfitting
High LR and too many -> Overfitting and degraded image quality.

Left image: High LR and too many training steps
Right Image: Low LR with suitable steps ImageImage
Prior preservation is important for faces. To train on faces we found that we need do more training steps, so prior preservation helps avoid overfitting here
If you see degraded/noisy images, it likely means the model is overfitting. Try above tricks to avoid it.
Also in different samplers seem to have different effect, DDIM seems more robust!
So try different sampler or & see if it improves results.
Left: klms
Right: DDIM ImageImage
as we saw in the first tweet, fine-tuning text encoder gives best results, but that means we can't train it on 16GB GPU.
Combine textual inversion + dreambooth.
We did one experiment, where we first did textual inversion and then trained dreambooth using that model. Image
The results are not as good as finetuning the whole text model as it seems it's overfitting here. But this can surely be improved.
This should allow us to get great results and still keep everything in <16GB
Thanks a lot to @NineOfNein @natanielruizg @pcuenq for helping conduct these experiments and for helpful suggestion 🤗
This analysis is not perfect, and there could many other ways to improve dreambooth. Please let us know if you find some mistakes or improvements :)
You can find all the experiments in this report wandb.ai/psuraj/dreambo…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Suraj Patil

Suraj Patil Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @psuraj28

Sep 28
dreambooth #stablediffusion training is now available in 🧨diffusers!

And guess what! You can run this on a 16GB colab in less than 15 mins!

Github: github.com/huggingface/di…
Colab for training: bit.ly/3SGPYmk
Colab for inference: bit.ly/3UJ4oUL
You can also find (and share your own!) trained concepts trained with dreambooth in our Collaborative Concepts Library: hf.co/sd-dreambooth-…
The example was contributed by Zhenhuan Liu github.com/Victarry. Thanks to the amazing contribution Zhenhuan!
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(