Retrieval Augmented #Diffusion (RDM) models: Smaller diffusion models can generate high-quality generations by accessing an external memory to guide the generation. Inspired by Deepmind's RETRO.
Day 10 #30daysofDiffusion#MachineLearning
If the model can rely on this external memory always, it just has to learn important details about the image generation process such as the composition of scenes rather than, for example, remembering how different dogs look like.
Jan 10, 2023 • 13 tweets • 5 min read
StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
Day 9 #30daysofDiffusion#MachineLearning
T2I models like SD produce great aesthetically pleasing generations for a given prompt, however, most of us never get them right on the first try. Sometimes the model ignores part of the prompt and some objects we want in the picture are missing.
Jan 9, 2023 • 11 tweets • 4 min read
InstructPix2Pix: Edit an image using text guidance using a single forward pass. Why use any inversion or other stuff,just create a dataset using inversion techniques and train a new model.
Day 8 #30daysofDiffusion#Diffusion#MachineLearning
It should be fast when you want to edit an image in real-time. Models like textual inversion or prompt-to-prompt optimize during inference which makes them slow.
Jan 2, 2023 • 9 tweets • 4 min read
DreamBooth: Assign a rare sequence of tokens as the subject's identifier and fine-tune the diffusion model on the small set of images with the "subject". A 🧵
Day 1 #30daysofDiffusion#Diffusion#MachineLearning
The authors use the Imagen model in this paper which uses T5-XXL language model to encode the text guidance to generate small 64x64 image first and then use a super-resolution model to blow it up to 1024x1024.