Gowthami Somepalli Profile picture
Grad student @UMDCS. Past: @AIatMeta, @AmazonScience, @IITMadras. Currently working on #Diffusion and #Multimodal understanding. GPU poor. She/her.
Jun 5, 2023 31 tweets 8 min read
📃🚨 Does your diffusion model copy from the training data? How to find such behavior? Why does it happen? Can we somehow mitigate it?

A summary of recent work on understanding training data replication in recent T2I #diffusion models. A long 🧶

#machinelearning #aigeneration Paper links
paper 1 -
paper 2 - https://t.co/mc78WKj4uHarxiv.org/abs/2212.03860
arxiv.org/abs/2305.20086
Jan 11, 2023 18 tweets 6 min read
Retrieval Augmented #Diffusion (RDM) models: Smaller diffusion models can generate high-quality generations by accessing an external memory to guide the generation. Inspired by Deepmind's RETRO.

A 🧶

Paper: arxiv.org/abs/2204.11824

Day 10 #30daysofDiffusion #MachineLearning Image If the model can rely on this external memory always, it just has to learn important details about the image generation process such as the composition of scenes rather than, for example, remembering how different dogs look like.
Jan 10, 2023 13 tweets 5 min read
StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.

A 🧵

Paper: arxiv.org/abs/2212.05032

Day 9 #30daysofDiffusion #MachineLearning T2I models like SD produce great aesthetically pleasing generations for a given prompt, however, most of us never get them right on the first try. Sometimes the model ignores part of the prompt and some objects we want in the picture are missing.
Jan 9, 2023 11 tweets 4 min read
InstructPix2Pix: Edit an image using text guidance using a single forward pass. Why use any inversion or other stuff,just create a dataset using inversion techniques and train a new model.

A 🧶

Paper: arxiv.org/abs/2211.09800

Day 8 #30daysofDiffusion #Diffusion #MachineLearning Image It should be fast when you want to edit an image in real-time. Models like textual inversion or prompt-to-prompt optimize during inference which makes them slow.
Jan 2, 2023 9 tweets 4 min read
DreamBooth: Assign a rare sequence of tokens as the subject's identifier and fine-tune the diffusion model on the small set of images with the "subject". A 🧵

Paper: arxiv.org/abs/2208.12242

Day 1 #30daysofDiffusion #Diffusion #MachineLearning Image The authors use the Imagen model in this paper which uses T5-XXL language model to encode the text guidance to generate small 64x64 image first and then use a super-resolution model to blow it up to 1024x1024.