Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Giannis Daras

@giannis_daras

Jun 3, 2022 • 11 tweets • 7 min read • Read on X

Scrolly

An update on the hidden vocabulary of DALLE-2.

While a lot of the feedback we received was constructive, some of the comments need to be addressed.

A thread, with some new gibberish text and some discussion 🧵 (1/N)

@benjamin_hilton

@benjamin_hilton said that we got lucky with the whales example.

We found another similar example.

"Two men talking about soccer, with subtitles" gives the word "tiboer". This seems to give sports in ~4/10 images. (2/N)

@realmeatyhuman

A few people, including @realmeatyhuman, asked whether our method works beyond natural images (of birds, etc).

Yes, we found some examples that seem statistically significant.

E.g. "doitcdces" seems related (~4/10 images) to students (or learning). (3/N)

Similarly, "comafuruder" seems correlated (~4/10) to sickness/hospitals/patients. (4/N)

@BarneyFlames

@BarneyFlames, @mattgroh pointed out that "Apoploe", our gibberish word for birds, has similar BPE encoding to "Apodidae".

Interestingly, "Apodidae" produces ~1/10 birds (but many flying insects), while our gibberish "Apoploe" gives 10/10.

(5/N)

However, "Apodidae Ploceidae" (two names of real bird families) indeed gives 10/10 birds.

Therefore, one possible explanation is that our gibberish tokens are mashups of parts of real words. This seems reasonable.

It is interesting that DALLE-2 generates those mashups.
(6/N)

@benjamin_hilton

Our gibberish tokens might have many meanings.

@benjamin_hilton run "Contarra ccetnxniams luryca tanniounons" and pointed out that not all are bugs. Indeed, our gibberish text produces a statistically significant fraction, but rarely a 100% match to the target concept. (7/N)

Our gibberish tokens have varying degrees of robustness in combinations with contexts.

E.g. if xx produces birds, ‘xx flying’ is an easy prompt
‘xx on a table’ is a neutral prompt, and ‘xx in space’ is a hard prompt.

(8/N)

Our hidden vocabulary seems robust in easy and sometimes neutral prompts but not in hard ones.

These tokens may produce low confidence in the generator and small perturbations move it in random directions.

"vicootes" means vegetables in some contexts and not in others. (9/N)

We want to emphasize that this is an adversarial attack and hence does not need to work all the time.

If a system behaves in an unpredictable way, even if that happens 1/10 times, that is still a massive security and interpretability issue, worth understanding. (10/N, N=10).

@benjamin_hilton

@benjamin_hilton, @realmeatyhuman, @BarneyFlames, @mattgroh, @rctatman, @Plinz, @Thomas_Woodside hopefully some of your concerns are addressed! Let us know what you think.

We will update the pre-print with this discussion:
arxiv.org/abs/2206.00169

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @giannis_daras

Giannis Daras

@giannis_daras

Nov 7, 2024

How much is a noisy image worth? 👀

We show that as long as a small set of high-quality images is available, noisy samples become extremely valuable, almost as valuable as clean ones.

Buckle up for a thread about dataset design and the value of data 💰

Assume that you have M dollars for buying data. You can buy a lot of cheap, low-quality data or a few expensive high-quality samples.

What's the best strategy for allocating your budget? 🤔

Ambient Diffusion and related frameworks allow you to train with noisy data.

But, as we show in our work, training solely on noisy data significantly hurts performance.

This might suggest that noisy samples are worthless. But is this really the case? 👀

Read 10 tweets

Giannis Daras

@giannis_daras

Dec 1, 2022

Multiresolution Textual Inversion.

Given a few images, we learn pseudo-words that represent a concept at different resolutions.

"A painting of a dog in the style of <jane(number)>" gives different levels of artistic freedom to match the <jane> style based on the number index.

The key idea of our method is to condition the embedding of the learned concept on the diffusion time.

Instead of learning one embedding to represent the concept, we learn a set of embeddings: each element of the set represents the object at different resolutions.

During inference, we can use the embeddings in many creative ways to access the learned object at different resolutions.

For example, given a painting made of buttons, we can isolate the buttons and create new objects with that texture.

Read 7 tweets

Giannis Daras

@giannis_daras

Sep 13, 2022

Announcing Soft Diffusion: A framework to correctly schedule, learn and sample from general diffusion processes.

State-of-the-art results on CelebA, outperforms DDPMs and vanilla score-based models.

A 🧵to learn about Soft Score Matching, Momentum Sampling and the role of noise

Typically, diffusion models generate images by reversing a known corruption process that gradually adds noise.

We show how to learn to reverse diffusions that involve a linear deterministic degradation and a stochastic part (additive noise).

Ingredient 1: Soft Score Matching.

Soft Score Matching incorporates the filtering process in the network. It trains the model to predict an image that after corruption matches the diffused observation.

Read 12 tweets

Giannis Daras

@giannis_daras

May 31, 2022

DALLE-2 has a secret language.
"Apoploe vesrreaitais" means birds.
"Contarra ccetnxniams luryca tanniounons" means bugs or pests.

The prompt: "Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons" gives images of birds eating bugs.

A thread (1/n)🧵

A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: "Two farmers talking about vegetables, with subtitles" gives an image that appears to have gibberish text on it.

However, the text is not as random as it initially appears... (2/n)

We feed the text "Vicootes" from the previous image to DALLE-2. Surprisingly, we get (dishes with) vegetables! We then feed the words: "Apoploe vesrreaitars" and we get birds. It seems that the farmers are talking about birds, messing with their vegetables! (3/n)

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Giannis Daras

Try unrolling a thread yourself!

More from @giannis_daras

Giannis Daras

Giannis Daras

Giannis Daras

Giannis Daras

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!