Giannis Daras Profile picture
MIT CSAIL Postdoc πŸ‘¨β€πŸŽ“ Ph.D. Computer Science @UTAustin πŸ‘¨β€πŸ’» Ex: @nvidia, @google, @explosion_ai, @ntua
Nov 7 β€’ 10 tweets β€’ 4 min read
How much is a noisy image worth? πŸ‘€

We show that as long as a small set of high-quality images is available, noisy samples become extremely valuable, almost as valuable as clean ones.

Buckle up for a thread about dataset design and the value of data πŸ’° Image Assume that you have M dollars for buying data. You can buy a lot of cheap, low-quality data or a few expensive high-quality samples.

What's the best strategy for allocating your budget? πŸ€”
Dec 1, 2022 β€’ 7 tweets β€’ 3 min read
Multiresolution Textual Inversion.

Given a few images, we learn pseudo-words that represent a concept at different resolutions.

"A painting of a dog in the style of <jane(number)>" gives different levels of artistic freedom to match the <jane> style based on the number index. The key idea of our method is to condition the embedding of the learned concept on the diffusion time.

Instead of learning one embedding to represent the concept, we learn a set of embeddings: each element of the set represents the object at different resolutions.
Sep 13, 2022 β€’ 12 tweets β€’ 4 min read
Announcing Soft Diffusion: A framework to correctly schedule, learn and sample from general diffusion processes.

State-of-the-art results on CelebA, outperforms DDPMs and vanilla score-based models.

A 🧡to learn about Soft Score Matching, Momentum Sampling and the role of noise Typically, diffusion models generate images by reversing a known corruption process that gradually adds noise.

We show how to learn to reverse diffusions that involve a linear deterministic degradation and a stochastic part (additive noise).
Jun 3, 2022 β€’ 11 tweets β€’ 7 min read
An update on the hidden vocabulary of DALLE-2.

While a lot of the feedback we received was constructive, some of the comments need to be addressed.

A thread, with some new gibberish text and some discussion 🧡 (1/N) @benjamin_hilton said that we got lucky with the whales example.

We found another similar example.

"Two men talking about soccer, with subtitles" gives the word "tiboer". This seems to give sports in ~4/10 images. (2/N) ImageImageImage
May 31, 2022 β€’ 10 tweets β€’ 4 min read
DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.

The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.

A thread (1/n)🧡 Image A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: "Two farmers talking about vegetables, with subtitles" gives an image that appears to have gibberish text on it.

However, the text is not as random as it initially appears... (2/n) Image