Divam Gupta Profile picture
Creator of one-click ML tool - https://t.co/NE4LuTGBv5 and https://t.co/oNpCKBAjH9 • AI for VR @Meta • Previously: research @Microsoft, robotics @CarnegieMellon

Nov 1, 2022, 12 tweets

DreamBooth is becoming popular for creating custom Stable Diffusion models using your images.

Here is a beginner friendly thread on how it works: 🧵

First, what does DreamBooth do?

- It takes few images of a particular subject

- Then it teaches the model to generate more images of that subject in different styles.

For eg. you give 20 normal images of yourself and get a funky painting of yourself in return.

(ft. @ylecun )

Models like Stable Diffusion already have strong priors for generating various things and combining them with several styles.

All you have to do is somehow add one additional subject to the model.

This is done by finetuning a pre-trained model on your images.

Let’s use the token ‘X’ as a unique identifier to represent our subject.

Along with the training images, you also need the class name of the subject.

So, if you are training the model on images of yourself, the class name will be “Person”, since you are a person.

Rather than using the prompt “An image of X”, it is better to use the prompt “An image of X person”.

This helps the model to use the semantics of a generic person while generating images of yourself.

We only have a few images of the ‘X’. So we don’t want the model to forget the knowledge of other things when training on ‘X’

This is fixed by leveraging a lot of images from the parent class of ‘X’

So, in our eg., the model will also use several images containing any person.

These extra images are used for the prior-preserving loss, which preserves the semantic knowledge of the class.

It encourages the model to generate diverse things belonging to the subject’s class.

In our example, the model will be trained on two objectives:

1) Given the prompt “A X person” generate images of X. → using your images

2) Given the prompt “A person” generate images of a person. → using general images of a person.

Now the question is, what should ‘X’ actually be?

DreamBooth uses a sequence of rare tokens in place of the subject ‘X’

These rare tokens are very unlikely to appear in prompts, so they won’t interfere with the prompt containing ‘X’

To produce high res images, DreamBooth fine-tunes a standard super-resolution model on the input images.

This is used to increase the resolution of the generated images.

Originally DreamBooth was implemented with ImageGen, but open-source versions are implemented with Stable Diffusion :

github.com/XavierXiao/Dre…

Here are some more examples of DreamBooth in action:

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling