Tweet

Dmitry Kobak

13 Sep, 12 tweets, 5 min read

@lpachter

I am late to the party (was on holidays), but have now read @lpachter's "Specious Art" paper as well as ~300 quote tweets/threads, played with the code, and can add my two cents.

Spoiler: I disagree with their conclusions. Some claims re t-SNE/UMAP are misleading. Thread. 🐘

https://twitter.com/lpachter/status/1431325969411821572

The paper has several parts and I have too many comments for a twitter thread, so here I will only focus on the core of the authors' argument against t-SNE/UMAP, namely Figures 2 and 3. We can discuss the rest some other time. [2/n]

In this part, Chari et al. claim that:

* t-SNE/UMAP preserve global and local structure very poorly;
* Purposefully silly embedding that looks like an elephant performs as well or even better;
* Even *untrained* neural network performs around as well.

[3/n]

These are very strong claims. However, inspection of Figure 2 suggests an even stronger claim: 2D PCA appears to perform much better still, with respect to both, their global & local metrics (on all datasets). See orange dots.

Strangely, this is not emphasized in the text. [4/n]

IMHO, these claims are bizarre.

Here are four 2D embeddings of MNIST: UMAP (aka t-SNE with exaggeration); PCA; untrained network; the elephant 🐘.

Chari et al. claim that the latter three are as good as or better than UMAP -- even for local structure preservation. Really? [5/n]

If you have a metric showing that PCA performs better than t-SNE then this metric does not quantify "preservation of local structure".

PCA/Picasso/etc may very well preserve _something_ better than t-SNE, but that "something" is definitely not local structure.

What gives? [6/n]

Turns out, "intra-type" correlations shown in Fig 2 are correlations across t-types between high-dim and low-dim *average* intra-type distances. It's a measure of type variability.

(Took me a while to realize this because Results do not mention averaging, only Methods.) [7/n]

I.e. the message here is that PCA/Picasso/random-network can preserve t-type variances better than t-SNE/UMAP.

Fair enough, but this is not at all what is normally meant by "preservation of local structure"! [8/n]

Chari et al. do not use any metrics that would quantify preservation of local structure in the common sense of the word (e.g. kNN recall, kNN classification accuracy, cluster/type Rand score, etc.).

If they did, they would of course find that t-SNE performs much better. [9/n]

By the way, I was wondering where the elephant shape came from. The paper calls it "von Neumann elephant" referring to the famous quote but does not explain the exact shape. Googling showed it's taken from this fun paper: aapt.scitation.org/doi/10.1119/1.…

Oddly, it's not cited... [10/n]

@jnboehm

Acknowledgments: thanks to @jnboehm, @JanLause, and @pavlinpolicar for detailed discussions. MNIST embeddings shown above are taken from a Colab notebook by @akshaykagrawal. [11/n]

@lpachter

The bigger picture of course is that @lpachter believes unsupervised 2D embeddings are useless. "What are they good for?" he asks in twitter discussions.

That is a great question! But I would need a whole other thread to answer it... Maybe later. [12/n]

https://twitter.com/lpachter/status/1432414194712780801

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @hippopedoid

Dmitry Kobak

@hippopedoid

12 Jan

@KrishnaswamyLab

OK, I'll bite.

PHATE (nature.com/articles/s4158…) from @KrishnaswamyLab is like Isomap meeting Diffusion Maps: MDS on geo distances obtained via diffusion. Cool paper!

So let's test it on: (1) MNIST, (2) Tasic2018, (3) n=1.3mln from 10x. Does it work as well as promised? 🧐 [1/7]

https://twitter.com/KrishnaswamyLab/status/1346302977712254976

Here is MNIST.

PHATE finds the same 4/7/9 and 8/5/3 mega-clusters that are also emphasized by UMAP, but fails to separate some of the digits within mega-clusters, e.g. green & red (3 and 5) overlap a lot.

IMHO that's a clearly worse performance than t-SNE or UMAP. [2/7]

https://twitter.com/KrishnaswamyLab/status/1201937339574050822

Of course PHATE was designed for continuous data and that's where it's supposed to shine. But the original paper and tweets like this one and the one above make it look as if it hands-down outperforms t-SNE/UMAP for clustered data.

I'm unconvinced. [3/7]

https://twitter.com/KrishnaswamyLab/status/1201937339574050822

Read 7 tweets

Dmitry Kobak

@hippopedoid

10 Dec 20

@JanLause

In a new paper with @JanLause & @CellTypist we argue that the best approach for normalization of UMI counts is *analytic Pearson residuals*, using NB model with an offset term for seq depth. + We analyze related 2019 papers by @satijalab and @rafalab. /1

biorxiv.org/content/10.110…

Our project began when we looked at Fig 2 in Hafemeister & Satija 2019 (genomebiology.biomedcentral.com/articles/10.11…) who suggested to use NB regression (w/ smoothed params), and wondered:

1) Why does smoothed β_0 grow linearly?
2) Why is smoothed β_1 ≈ 2.3??
3) Why does smoothed θ grow too??? /2

The original paper does not answer any of that.

Jan figured out that: (1) is trivially true when assuming UMI ~ NB(p_gene * n_cell); (2) simply follows from HS2019 parametrization & the magic constant is 2.3=ln(10); (3) is due to bias in estimation of overdispersion param θ! /3

Read 12 tweets

Dmitry Kobak

@hippopedoid

21 Oct 20

@jhnhw

Remember the galaxy-like UMAP visualization of integers from 1 to 1,000,000 represented as prime factors, made by @jhnhw?

I did t-SNE of the same data, and figured out what the individual blobs are. Turns out, the swirly and spaghetti UMAP structures were artifacts :-(

[1/n]

@jhnhw

Here is the original tweet by @jhnhw. His write-up: johnhw.github.io/umap_primes/in…. UMAP preprint v2 by @leland_mcinnes et al. has a figure with 30,000,000 (!) integers.

But what are all the swirls and spaghetti?

Unexplained mystery since 2008. CC @ch402. [2/n]

https://twitter.com/jhnhw/status/1031829726757900288

The input here is a 1,000,000 x 78,628 matrix X with X_ij = 1 if integer i is divisible by the j'th prime number, and 0 otherwise. So columns correspond to 2, 3, 5, 7, 11, etc. The matrix is large but very sparse: only 0.0036% of entries are 1s. We'll use cosine similarity. [3/n]

Read 11 tweets

Dmitry Kobak

@hippopedoid

20 Jul 20

@jnboehm

New preprint on attraction-repulsion spectrum in t-SNE => continuity-discreteness trade-off!

We also show that UMAP has higher attraction due to negative sampling, and not due to its loss. 🤯 Plus we demystify FA2.

With @jnboehm and @CellTypist.
arxiv.org/abs/2007.08902 [1/n]

@GCLinderman

We get the spectrum by changing the "exaggeration" in t-SNE, i.e. multiplying all attractive forces by a constant factor ρ. Prior work by @GCLinderman et al. showed that ρ->inf corresponds to Laplacian eigenmaps. We argue that the entire spectrum is interesting. [2/n]

Stronger attraction preserves continuous manifold structure. Stronger repulsion brings out discrete cluster structure.

Here is a toy dataset with 20 Gaussians arranged on a line, like a necklace. With LE one sees the string. With t-SNE one sees the individual beads. [3/n]

Read 10 tweets

Dmitry Kobak

@hippopedoid

26 Mar 20

Spent some time investigating history of "double descent". As a function of model complexity, I haven't seen it described before 2017. As a function of sample size, it can be traced to 1995; earlier research seems less relevant. Also: I think we need a better term. Thread. (1/n)

The term "double descent" was coined by Belkin et al 2019 pnas.org/content/116/32… but the same phenomenon was also described in two earlier preprints: Spigler et al 2019 iopscience.iop.org/article/10.108… and Advani & Saxe 2017 arxiv.org/abs/1710.03667 (still unpublished?) (2/n)

I don't like the term "double descent" because it has nothing to do with gradient descent. And nothing is really descending. It's all about bias-variance tradeoffs, so maybe instead of the U-shaped tradeoff one should talk about \/\-shaped? И-shaped? UL-shaped? ʯ-shaped? (3/n)

Read 13 tweets

Dmitry Kobak

@hippopedoid

12 Feb 20

@GCLinderman

Becht et al.: UMAP preserves global structure better than t-SNE.

@GCLinderman & me: only because you used random init for t-SNE but spectral init for UMAP.

@NikolayOskolkov: that's wrong; init does not matter; the loss function does.

This thread is a response to Nikolay. (1/n)

https://twitter.com/hippopedoid/status/1207999178015727616

@NikolayOskolkov

@NikolayOskolkov is the only person I saw arguing with that. Several people provided further simulations showing that UMAP with random init can mess up the global structure. I saw @leland_mcinnes agreeing that init can be important. It makes sense. (2/n)

https://twitter.com/leland_mcinnes/status/1215025214674878474

@NikolayOskolkov

But @NikolayOskolkov argued against. Here is his popular UMAP write-up: towardsdatascience.com/how-exactly-um…, and here: towardsdatascience.com/why-umap-is-su… he explicitly disagreed with our Comment. I think his UMAP posts are great and I like them a lot, but in this point I believe he is mistaken. (3/n)

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Dmitry Kobak

Try unrolling a thread yourself!

More from @hippopedoid

Dmitry Kobak

Dmitry Kobak

Dmitry Kobak

Dmitry Kobak

Dmitry Kobak

Dmitry Kobak

Did Thread Reader help you today?

Like this author's thread?