the use of init images consists of a simple trick that allows us to start our generations from an image.
init images give us a lot of control over our generations.
we can choose where to place each element and what style, colors, or pose it should have.
while this technique might look similar to #dalle2 variations, this is a completely different thing.
DALL-E 2 can produce variations that respect the style and content of our images.
with init images, we aggregate generated content on top of our image, and we can even decide how much original content we want to preserve.
the technique behind init images is quite simple...
if we know a bit about how diffusion models work.
let’s get started 🤓
diffusion models are currently state-of-the-art for image generation.
they are at the core of AI models like DALL-E 2 (@OpenAI), Imagen (@GoogleAI), and Stable Diffusion. (@StabilityAI).
the process of generating images with diffusion consists of a sequence of denoising steps that recursively remove noise from a noisy image until it gets to something that is noise-free and looks realistic.
take a deep breath here 🧘
if you played with dream studio, you’ll recognize the “step” parameter.
this parameter controls the number of denoising steps that we want stable diffusion to perform.
the larger, the more noise it will remove (although after 50 steps most of the noise is already gone).
the following image depicts this process.
the denoising sequence goes from right (complete noise) to left (realistic cute cat), and how the output at each denoising step is the input of the one that proceeds.
the technique behind init images consists of:
1) adding noise to our real image 2) inserting it in the middle of the denoising process, and 3) making the diffusion model “think” that it needs to keep denoising from there.
if we add our init image too early in the diffusion sequence (to the right of the previous diagram), we will see a lot of change in the final result.
if we add it too late in the process (to the left of the diagram), the init image will be unchanged.
knowing where to place our init images is part of the prompt engineering process with this technique.
and folks are already getting GOOD at it!!
@blessedbck is using this technique with mastery to generate stunning vintage photographs 🤯
@EErratica used their (already great) sketches as init images 🤩