For the techies:
Turns out sending gradients straight through this rgb-quantization is not great for stability, so I'm also minimizing mean(quant_distances) to keep raw img close to quantized one!
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
I continued exploring #stablediffusion's latent space over the weekend and oh my; there's still a LOT of treasure to be discovered inside this magnificent neural universe!
Here's a quick thread with some of my personal favorites and how I found them..
The fact that all this visual splendor is compressed in just 4Gb of neural network weights totally blows my mind. Call it compression, call it emergence, it's just ๐คฏ๐คฏ
Getting bored by a StyleGAN model after looking at samples for 20 minutes seems like a very distant past now..
Reminiscent of cut-up poetry, one cool trick I implemented is to: 1. Start with a list of great, proven prompts 2. Chunk the prompts into word groups of ~2-5 words 3. Randomly recombine multiple word groups into new 'pseudo-prompts'
Turns out, some of those work really well ๐๐
Ok, so first of all, #stablediffusion did not come with code to make videos, so I came up with a way to interpolate between encoded prompt vectors (no worries if you don't know what that means) and thereby create video sequences from prompt sequences (1/n)
Next, I had to come up with a visual narrative that would work well with the style of the Diffusion interpolations. You can't just tell any story here: like with any medium, you have to work within the constraints of the technology. (2/n)
Once I settled on the "evolution" narrative, I wrote about a thousand different prompts, containing many variations on the narrative sequence I wanted. I then rendered all the corresponding stills with multiple seeds over roughly two nights of GPU time. (3/n)
"Voyage through Time"
is my first artpiece using #stablediffusion and I am blown away with the possibilities...
We're crossing a threshold where generative AI is no longer just about novel aesthetics, but evolving into an amazing tool to build powerful, human-centered narratives
This video was created using 36 consecutive phrases that define the visual narrative.
To find the best possible sequence, I tried over a thousand different prompts and seeds and applied many "prompt engineering" tricks in my code, to figure out what works and what doesn't
The way this model "interpolates" between the meaning of two sentences (in semantic rather than visual latent space) is a huge gamechanger for storytelling, and this is only just the beginning of a MASSIVE revolution in digital content creation powered by generative AI..
I discovered a bug in my own Diffusion + CLIP pipeline and suddenly the samples are unreal.. ๐คฏ
Here's
"Just a liquid reality..." #AIart#notdalle2#Diffusion#clip
This is a "3D-diffusion" video created using a combination of four different AI models๐คฏ
Welcome to the metaverse! ๐๐
There's such incredible potential here that I want to explain how I made this, so here's a thread! (1/n)
The two main models that draw the pixels are a diffusion model guided by a language prompt through @OpenAI's CLIP model.
This idea was introduced by @advadnoun and later refined by many other creatives. My talk at @Kikk_Festival further explains this:
The diffusion model (I integrated code from @RiversHaveWings and @Somnai_dreams for this) generates images by iteratively denoising noisy-pixel images, every time you run this from different noise, you get a different image, guided by the language prompt:
Finally playing around with CLIP + diffusion models.
12 GPU hours in I gotta say I'm pretty impressed with the difference in esthetics compared to VQGAN๐
Big thanks to @RiversHaveWings & @Somnai_dreams for providing great starting code!
"a dystopian city"
"The real problem of humanity is that we have Paleolithic emotions, medieval institutions and godlike technology"