Giannis Daras Profile picture
MIT CSAIL Postdoc 👨‍🎓 Ph.D. Computer Science @UTAustin On the academic job market for tenure-track positions 👨‍💻 Ex: @nvidia, @google, @explosion_ai, @ntua

May 31, 2022, 10 tweets

DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.

The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.

A thread (1/n)🧵

A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: "Two farmers talking about vegetables, with subtitles" gives an image that appears to have gibberish text on it.

However, the text is not as random as it initially appears... (2/n)

We feed the text "Vicootes" from the previous image to DALLE-2. Surprisingly, we get (dishes with) vegetables! We then feed the words: "Apoploe vesrreaitars" and we get birds. It seems that the farmers are talking about birds, messing with their vegetables! (3/n)

Another example: "Two whales talking about food, with subtitles". We get an image with the text "Wa ch zod rea" written on it. Apparently, the whales are actually talking about their food in the DALLE-2 language. (4/n)

Some words from the DALLE-2 language can be learned and used to create absurd prompts. For example, "painting of Apoploe vesrreaitais" gives a painting of a bird. "Apoploe vesrreaitais" means to the model "something that flies" and can be used across diverse styles. (5/n)

The discovery of the DALLE-2 language creates many interesting security and interpretability challenges.

Currently, NLP systems filter text prompts that violate the policy rules. Gibberish prompts may be used to bypass these filters. (6/n)

We wrote a small paper with @AlexGDimakis summarizing our findings.
Please find the paper here: giannisdaras.github.io/publications/D…
Arxiv version coming soon.
(7/n, n=7).

Based on valid comments, we updated our paper with a discussion on Limitations and changed the title to Discovering the Hidden Vocabulary of DALLE-2. Thanks to @mraginsky @rctatman @benjamin_hilton and others for useful comments.

Paper is now on arXiv: arxiv.org/abs/2206.00169

Responses to some of the criticism can be found here:

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling