What fascinates me about generating images with VQGAN+CLIP is that it CAN generate depth and drama, but only if you know how to ask for them.
"A herd of sheep grazing on a lush green hillside" alone
vs with "amazing awesome and epic" added
ai-weirdness.ghost.io/the-art-of-ask…
Because CLIP is trained on internet images and text, it associates the "good" images with certain phrases.
"A herd of sheep grazing on a lush green hillside" before vs after adding "in the style of disney trending on artstation | unreal engine"
I experimented with different ways of asking CLIP+VQGAN for an attractive version of "a herd of sheep grazing on a lush green hillside"
"Award winning national geographic photography" produced impressive scenery but the sheep look like people crawling under green blankets.
Adding "by Bob Ross" to "a herd of sheep grazing on a lush green hillside" did get CLIP+VQGAN to improve the composition dramatically, but gave all the sheep Bob Ross hair.
Adding "by Tim Burton" to "a herd of sheep grazing on a lush green hillside" got CLIP+VQGAN to generate this very cool looking image. Not sure what happened to the sheep though.
I hate that one of the most effective ways to prompt CLIP+VQGAN to generate a realistic and attractive landscape is to ask for this:
"A herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper"
Using the spammy "A herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper" prompt as a starting point leads CLIP+VQGAN to some irritatingly gorgeous places.
Here, I added "cubist cezanne".
I had VQGAN+CLIP generate "A herd of sheep grazing on a lush green hillside | dramatic atmospheric ultra high definition free desktop wallpaper by lisa frank" and got this absolutely apocalyptic landscape.
I think those slippery purple things may be what's become of the sheep.
This experiment illustrates an interesting aspect of generating stuff with big internet-trained models: it's seen a lot of crummy examples of what you're looking for, and those are just as valid to it as the good ones.
It CAN generate the good stuff. But how do you ask for it?
For more technical details on CLIP+VQGAN and other methods of steering CLIP, plus some gorgeous example images, I recommend this post by @sea_snell
You can generate CLIP+VQGAN images yourself for free! I used a version by @RiversHaveWings inspired by @advadnoun's Big Sleep notebook.
Tutorial linked here:
Here's an online @runwayml demo of a much earlier AI called AttnGAN. It tries.
No, you're absolutely right, the sheep are uniformly cursed.
Part of why I chose a herd of grazing sheep is image recognition algorithms have historically had trouble with distinguishing the sheep from the landscape: aiweirdness.com/post/171451900…
"a crummy image of a herd of sheep grazing on a lush green hillside"
"a herd of sheep grazing on a lush green hillside realistic please and not some horrible AI atrocity"
much better than i expected to be honest
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.