Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Giannis Daras

@giannis_daras

May 31, 2022 • 10 tweets • 4 min read • Read on X

DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.

The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.

A thread (1/n)🧵

A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: "Two farmers talking about vegetables, with subtitles" gives an image that appears to have gibberish text on it.

However, the text is not as random as it initially appears... (2/n)

We feed the text "Vicootes" from the previous image to DALLE-2. Surprisingly, we get (dishes with) vegetables! We then feed the words: "Apoploe vesrreaitars" and we get birds. It seems that the farmers are talking about birds, messing with their vegetables! (3/n)

Another example: "Two whales talking about food, with subtitles". We get an image with the text "Wa ch zod rea" written on it. Apparently, the whales are actually talking about their food in the DALLE-2 language. (4/n)

Some words from the DALLE-2 language can be learned and used to create absurd prompts. For example, "painting of Apoploe vesrreaitais" gives a painting of a bird. "Apoploe vesrreaitais" means to the model "something that flies" and can be used across diverse styles. (5/n)

The discovery of the DALLE-2 language creates many interesting security and interpretability challenges.

Currently, NLP systems filter text prompts that violate the policy rules. Gibberish prompts may be used to bypass these filters. (6/n)

@AlexGDimakis

We wrote a small paper with @AlexGDimakis summarizing our findings.
Please find the paper here: giannisdaras.github.io/publications/D…
Arxiv version coming soon.
(7/n, n=7).

@mraginsky

Based on valid comments, we updated our paper with a discussion on Limitations and changed the title to Discovering the Hidden Vocabulary of DALLE-2. Thanks to @mraginsky @rctatman @benjamin_hilton and others for useful comments.

Paper is now on arXiv: arxiv.org/abs/2206.00169

https://twitter.com/giannis_daras/status/1532605363232444416

Responses to some of the criticism can be found here:

https://twitter.com/giannis_daras/status/1532605363232444416

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @giannis_daras

Giannis Daras

@giannis_daras

Nov 7, 2024

How much is a noisy image worth? 👀

We show that as long as a small set of high-quality images is available, noisy samples become extremely valuable, almost as valuable as clean ones.

Buckle up for a thread about dataset design and the value of data 💰

Assume that you have M dollars for buying data. You can buy a lot of cheap, low-quality data or a few expensive high-quality samples.

What's the best strategy for allocating your budget? 🤔

Ambient Diffusion and related frameworks allow you to train with noisy data.

But, as we show in our work, training solely on noisy data significantly hurts performance.

This might suggest that noisy samples are worthless. But is this really the case? 👀

Read 10 tweets

Giannis Daras

@giannis_daras

Dec 1, 2022

Multiresolution Textual Inversion.

Given a few images, we learn pseudo-words that represent a concept at different resolutions.

"A painting of a dog in the style of <jane(number)>" gives different levels of artistic freedom to match the <jane> style based on the number index.

The key idea of our method is to condition the embedding of the learned concept on the diffusion time.

Instead of learning one embedding to represent the concept, we learn a set of embeddings: each element of the set represents the object at different resolutions.

During inference, we can use the embeddings in many creative ways to access the learned object at different resolutions.

For example, given a painting made of buttons, we can isolate the buttons and create new objects with that texture.

Read 7 tweets

Giannis Daras

@giannis_daras

Sep 13, 2022

Announcing Soft Diffusion: A framework to correctly schedule, learn and sample from general diffusion processes.

State-of-the-art results on CelebA, outperforms DDPMs and vanilla score-based models.

A 🧵to learn about Soft Score Matching, Momentum Sampling and the role of noise

Typically, diffusion models generate images by reversing a known corruption process that gradually adds noise.

We show how to learn to reverse diffusions that involve a linear deterministic degradation and a stochastic part (additive noise).

Ingredient 1: Soft Score Matching.

Soft Score Matching incorporates the filtering process in the network. It trains the model to predict an image that after corruption matches the diffused observation.

Read 12 tweets

Giannis Daras

@giannis_daras

Jun 3, 2022

An update on the hidden vocabulary of DALLE-2.

While a lot of the feedback we received was constructive, some of the comments need to be addressed.

A thread, with some new gibberish text and some discussion 🧵 (1/N)

@benjamin_hilton

@benjamin_hilton said that we got lucky with the whales example.

We found another similar example.

"Two men talking about soccer, with subtitles" gives the word "tiboer". This seems to give sports in ~4/10 images. (2/N)

@realmeatyhuman

A few people, including @realmeatyhuman, asked whether our method works beyond natural images (of birds, etc).

Yes, we found some examples that seem statistically significant.

E.g. "doitcdces" seems related (~4/10 images) to students (or learning). (3/N)

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Giannis Daras

Try unrolling a thread yourself!

More from @giannis_daras

Giannis Daras

Giannis Daras

Giannis Daras

Giannis Daras

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!