Chari et al. (@lpachter) have updated their preprint and doubled down on their claim that an π-looking embedding, a random (!) embedding, and 2D PCA, all preserve data structure "similar or better" than t-SNE.
They literally say: "Picasso can quantitatively represent [local and global properties] similarly to, or better, than the respective t-SNE/UMAP embeddings".
In my thread below I argued it's a non-sequitur from Fig 2, due to insufficient metrics. [2/n]
I argued that they should also consider metrics like kNN recall or kNN classification accuracy, where t-SNE would fare much better than these other methods.
I thought it should be obvious from this figure (using MNIST). But now @lpachter says it's a "mirage".
Is it? [3/n]
The kNN recall (k=15) of default t-SNE (openTSNE) on MNIST is 36% (interestingly, for UMAP it's much worse; see our preprint with @jnboehm & @CellTypist, arxiv.org/abs/2007.08902).
Lior says 36% is bad. I actually think it's quite good.
But the point is, for PCA it's 1%! [4/n]
That's not just a bit worse. It's thirty-six times worse! [5/n]
Lior says t-SNE of MNIST looks pleasing but it's an "illusion" due to overplotting. No, it's not an illusion! kNN (k=15) test-set classification accuracy is 97.4%.
kNN accuracy in 50D PCA is 97.2% and in raw 784D it's 96.3%.
So t-SNE **wins** over raw pixels and 50D PCA. [6/n]
But look, there are mis-positioned points, says Lior.
Yes there are. And if one looks at them, one can see that these are screwed-up digits. Here are some examples that t-SNE places into island of "1s". In pixel space, these are indeed close to 1s.
So not t-SNE's fault. [7/n]
Incidentally, this shows the power of exploratory analysis and t-SNE as a visual aid. Namely, we have just discovered something interesting about the data: there are 4s and 7s in there that look like 1s! Neat, isn't?
It's just one example of where t-SNE is actually useful. [8/n]
By the way, that MNIST figure in Lior's thread (Suppl Fig 11) is done using @scikit_learn implementation. It's slow and has bad default parameters (will be fixed in sklearn 1.0). That's why the embedding looks worse than it should.
In conclusion, my practical suggestion to @lpachter is simple: add kNN recall and kNN accuracy to Figure 2, there is enough empty space there for two more metrics.
Then everybody will be able to see and judge how t-SNE is worse than PCA/Picasso and how it's better. [10/n]
Otherwise it will look like you have cleverly cherry-picked two metrics to prove your point. [11/n]
PS. Nice to see that the Mayer et al. paper on fitting the elephant is now cited! You are welcome. [12/12]
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
I am late to the party (was on holidays), but have now read @lpachter's "Specious Art" paper as well as ~300 quote tweets/threads, played with the code, and can add my two cents.
Spoiler: I disagree with their conclusions. Some claims re t-SNE/UMAP are misleading. Thread. π
The paper has several parts and I have too many comments for a twitter thread, so here I will only focus on the core of the authors' argument against t-SNE/UMAP, namely Figures 2 and 3. We can discuss the rest some other time. [2/n]
In this part, Chari et al. claim that:
* t-SNE/UMAP preserve global and local structure very poorly;
* Purposefully silly embedding that looks like an elephant performs as well or even better;
* Even *untrained* neural network performs around as well.
PHATE finds the same 4/7/9 and 8/5/3 mega-clusters that are also emphasized by UMAP, but fails to separate some of the digits within mega-clusters, e.g. green & red (3 and 5) overlap a lot.
IMHO that's a clearly worse performance than t-SNE or UMAP. [2/7]
Of course PHATE was designed for continuous data and that's where it's supposed to shine. But the original paper and tweets like this one and the one above make it look as if it hands-down outperforms t-SNE/UMAP for clustered data.
In a new paper with @JanLause & @CellTypist we argue that the best approach for normalization of UMI counts is *analytic Pearson residuals*, using NB model with an offset term for seq depth. + We analyze related 2019 papers by @satijalab and @rafalab. /1
1) Why does smoothed Ξ²_0 grow linearly? 2) Why is smoothed Ξ²_1 β 2.3?? 3) Why does smoothed ΞΈ grow too??? /2
The original paper does not answer any of that.
Jan figured out that: (1) is trivially true when assuming UMI ~ NB(p_gene * n_cell); (2) simply follows from HS2019 parametrization & the magic constant is 2.3=ln(10); (3) is due to bias in estimation of overdispersion param ΞΈ! /3
The input here is a 1,000,000 x 78,628 matrix X with X_ij = 1 if integer i is divisible by the j'th prime number, and 0 otherwise. So columns correspond to 2, 3, 5, 7, 11, etc. The matrix is large but very sparse: only 0.0036% of entries are 1s. We'll use cosine similarity. [3/n]
We get the spectrum by changing the "exaggeration" in t-SNE, i.e. multiplying all attractive forces by a constant factor Ο. Prior work by @GCLinderman et al. showed that Ο->inf corresponds to Laplacian eigenmaps. We argue that the entire spectrum is interesting. [2/n]
Here is a toy dataset with 20 Gaussians arranged on a line, like a necklace. With LE one sees the string. With t-SNE one sees the individual beads. [3/n]
Spent some time investigating history of "double descent". As a function of model complexity, I haven't seen it described before 2017. As a function of sample size, it can be traced to 1995; earlier research seems less relevant. Also: I think we need a better term. Thread. (1/n)
I don't like the term "double descent" because it has nothing to do with gradient descent. And nothing is really descending. It's all about bias-variance tradeoffs, so maybe instead of the U-shaped tradeoff one should talk about \/\-shaped? Π-shaped? UL-shaped? Κ―-shaped? (3/n)