12,399 views

Dmitry Kobak

@hippopedoid

, 12 tweets, 5 min read

My Authors

@GCLinderman

@GCLinderman

Becht et al.: UMAP preserves global structure better than t-SNE.

@GCLinderman & me: only because you used random init for t-SNE but spectral init for UMAP.

@NikolayOskolkov: that's wrong; init does not matter; the loss function does.

This thread is a response to Nikolay. (1/n)

https://twitter.com/hippopedoid/status/1207999178015727616

@NikolayOskolkov

@NikolayOskolkov

@NikolayOskolkov is the only person I saw arguing with that. Several people provided further simulations showing that UMAP with random init can mess up the global structure. I saw @leland_mcinnes agreeing that init can be important. It makes sense. (2/n)

https://twitter.com/leland_mcinnes/status/1215025214674878474

@NikolayOskolkov

@NikolayOskolkov

But @NikolayOskolkov argued against. Here is his popular UMAP write-up: towardsdatascience.com/how-exactly-um…, and here: towardsdatascience.com/why-umap-is-su… he explicitly disagreed with our Comment. I think his UMAP posts are great and I like them a lot, but in this point I believe he is mistaken. (3/n)

He has an empirical and a theoretical argument. The empirical argument is that he tried t-SNE and UMAP with random and non-random inits on a bunch of datasets and did not observe much of a difference. Well, sure, it does not *always* matter. But sometimes it does. (4/n)

E.g. take a 1D line and try embedding it with UMAP using random initialization. If this is not a messed up "global structure" then I don't know what is. Of course any kind of meaningful init like Laplacian eigenmaps (or PCA) will do much better here. (5/n)

Also, the whole point of our Comment is that on all three datasets from the original publication, UMAP with random init performed much worse than with LE init, according to the very metric suggested by the authors! Our little paper is not stand-alone; it is a *comment*. (6/n)

Let us now consider the theoretical argument. Nikolay's claim is that UMAP's cross-entropy loss preserves global structure better because the repulsion term makes distant points be distant in the embedding. This argument misses two crucial points. (7/n)

https://twitter.com/SyntopiaDK/status/1212368141201874946

https://twitter.com/SyntopiaDK/status/1212368141201874946

First, the UMAP implementation sets the coefficient in front of the repulsion term to constant 1. This term becomes data-independent: all points are non-specifically repulsed from each other. Being data-independent, it cannot "preserve" anything. (8/n)

https://twitter.com/SyntopiaDK/status/1212368141201874946

Second, t-SNE *also* has a repulsion term (coming from the normalization of q's) but it is ignored in Nikolay's treatment. In fact, decreasing t-SNE's repulsion by ~4 yields embeddings that are uncannily similar to UMAP! The repulsive force in both cases is non-specific. (9/n)

But does this UMAP embedding of MNIST preserve global structure better than t-SNE (without exaggeration)? Perhaps it does, but I haven't seen any quantification of that. According to the metric suggested by Becht et al., the global structure is slightly better in t-SNE! (10/n)

So to summarize: UMAP and t-SNE have different losses. Both include non-specific repulsion. UMAP's loss seems to yield ~4x stronger attraction => more white space. Sometimes this seems to be beneficial, sometimes not. Does it preserve "global structure" better? Unclear. (11/n)

@EtienneBecht

@EtienneBecht

In any case, smart initialization can improve the results for both t-SNE and UMAP. The only point of our Comment was that the results reported by @EtienneBecht et al. were entirely due to different initialization choices; as far as I can see, this remains undisputed. END. (12/n)

Enjoying this thread?

Keep Current with Dmitry Kobak

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

More from @hippopedoid see all

Related threads

Trending hashtags

Did Thread Reader help you today?