Tweet

Justin Alvey

Follow @justLV

Nov 1 • 6 tweets • 3 min read

https://twitter.com/karenxcheng/status/1587510079770615809

1/ I created this with Stable Diffusion using image inpainting and “walking through the latent space”

Without using tweening, every frame is generated by an interpolated embedding and variable denoising strength, so keeping continuity was tricky

See 🧵for process

https://twitter.com/karenxcheng/status/1587510079770615809

2/ First off, finding the right combination of prompt, seed and denoising strength for an #img2img in-painting is a roll of the dice

Luckily it is easy to script large batches to cherrypick

3/ The first and last pairs were just regular #img2img ramped through a range of denoising strength of 0 to 0.8

@huggingface

4/ Transitions were done using a customized @huggingface 🧨Diffusers pipeline.

This lets me “slerp” between both noise latents AND text embeddings, for each given seed & prompt respectively

(while keeping denoising strength at ~0.8)

5/ Some tricks were required with blending and adjusting the inpainting mask to smoothly switch over the init images of the two real phones

(example generations on the right)

6/ Not all walks through the latent space were a smooth path, but it’s easy to script it to find pairs that work well (and let your GPU replace your central heating)

Having the ability to play with these models on this level is incredible.

More creative AI experiments to come!