@SchmidhuberAI On open-source AI, “I signed this open letter by @laion_ai because I strongly favor the open-source movement. And I think it's also something that is going to challenge whatever big tech dominance there might be at the moment.”
A #StableDiffusion model trained on images of Japanese Kanji characters came up with “Fake Kanji” for novel concepts like Skyscraper, Pikachu, Elon Musk, Deep Learning, YouTube, Gundam, Singularity, etc.
This is similar to the “Fake Kanji” with recurrent neural network experiments I did many years ago, when computers were 1000x less powerful :) Kind of fun to see updated results with modern diffusion models.
Excited to announce the release of Stable Diffusion 2.0!
Many new features in v2:
• Base 512x512 and 768x768 models trained from scratch with new OpenCLIP text encoder
• X4 upscaling text-guided diffusion model
• New “Depth2Image” functionality
The new SD2 base model is trained from scratch using OpenCLIP-ViT/H text encoder (github.com/mlfoundations/…), with quality improvements over V1. It is fine-tuned using v-prediction (arxiv.org/abs/2202.00512) to produce 768x768 images:
A new 4x up-scaling text-guided diffusion model, enabling resolutions of 2048x2048 (or even higher!), when combined with the new text-to-image models in this release.
Tried some interesting prompts to test OpenAI’s new reduced-bias #dalle2#dalle model that will generate images of people that more accurately reflect the diversity of the world’s population.
“Professional DSLR color photograph of British soldiers during the American Revolution”
The most interesting and viral images you see produced by text-to-image models are not merely the results of the deep learning models themselves, but rather the result of a complex feedback loop between a human neural net🧠 interacting with an artificial neural net🤖.
🧵Thread👇
You can clearly see this, because the prompts for images that end up going viral for one model, clearly don’t “work” for another model.
The best images are chosen from evolutionary selection at the community level, and each image are the result of human/model iterative feedback:
From the #dallemini phenomenon, it’s also clear that the most viral content is not related to particular art styles, or whether the model can produce high quality images (reflected in training data). But rather, whether the model can portray cultural items that people talk about.