Giannis Daras Profile picture
Ph.D. candidate, Computer Science @UTAustin, working with @AlexGDimakis. Research Scientist Intern @nvidia. Ex: @google, @explosion_ai, @ntua
Olímpico de Jesus Moreira Chaves Profile picture 1 subscribed
Dec 1, 2022 7 tweets 3 min read
Multiresolution Textual Inversion.

Given a few images, we learn pseudo-words that represent a concept at different resolutions.

"A painting of a dog in the style of <jane(number)>" gives different levels of artistic freedom to match the <jane> style based on the number index. The key idea of our method is to condition the embedding of the learned concept on the diffusion time.

Instead of learning one embedding to represent the concept, we learn a set of embeddings: each element of the set represents the object at different resolutions.
Sep 13, 2022 12 tweets 4 min read
Announcing Soft Diffusion: A framework to correctly schedule, learn and sample from general diffusion processes.

State-of-the-art results on CelebA, outperforms DDPMs and vanilla score-based models.

A 🧵to learn about Soft Score Matching, Momentum Sampling and the role of noise Typically, diffusion models generate images by reversing a known corruption process that gradually adds noise.

We show how to learn to reverse diffusions that involve a linear deterministic degradation and a stochastic part (additive noise).
Jun 3, 2022 11 tweets 7 min read
An update on the hidden vocabulary of DALLE-2.

While a lot of the feedback we received was constructive, some of the comments need to be addressed.

A thread, with some new gibberish text and some discussion 🧵 (1/N) @benjamin_hilton said that we got lucky with the whales example.

We found another similar example.

"Two men talking about soccer, with subtitles" gives the word "tiboer". This seems to give sports in ~4/10 images. (2/N) ImageImageImage
May 31, 2022 10 tweets 4 min read
DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.

The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.

A thread (1/n)🧵 Image A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: "Two farmers talking about vegetables, with subtitles" gives an image that appears to have gibberish text on it.

However, the text is not as random as it initially appears... (2/n) Image