Lior Pachter Profile picture
Jan 22, 2022 26 tweets 11 min read Read on X
Is a single-cell RNA-seq atlas really an atlas? A short thread about #scRNAseq, maps, and atlantes (yes, the plural of atlas is atlantes! h/t @NeuroLuebbert). 🧵1/
Atlantes must be accurate to be useful, and the vexing question for centuries, namely how to best represent the spherical earth in 2D, is nontrivial. There have been many proposals with pros & cons for each (because the sphere and the plane have different Gaussian curvatures). 2/
In #scRNAseq, atlases of cells have become synonyms with UMAP figures of gene expression matrices (used to be t-SNE but UMAP seems more popular now). Map making from gene expression matrices is more challenging than map making of our 3D world; #scRNAseq is in ~10⁴ dimensions. 3/
Mathematician George Pólya gave the following advice: "If you can't solve a problem, then there is an easier problem you can't solve: find it." This has been ignored in #scRNAseq, which wouldn't matter, except the method used for the general case fails on the simplest one.4/
Below is an example from a simple case. It's UMAP of a group of cells that are not in some huge dimension; here there are only 3 genes. The data was clustered with the popular "Leiden" method. The figure *seems* ok with the visual more or less confirming the clustering. 5/
But what was the actual example, the "ground truth" that this "atlas" represents? These were points selected uniformly at random on the sphere. No actual structure whatsoever. You can see how the UMAPs look for varying parameters: 6/
The Leiden clustering was performed on the uniformly sampled points. Of course the clusters consists of points that were close together, but their boundaries and shapes are meaningless... the points were sampled (densely) uniformly at random... 7/
How do people currently select parameters for the UMAPs they make? They tune with them until they get a picture that matches the clustering well...#confirmationbias 8/
You might wonder whether *any* of the choices of parameters produce a good map. All the atlantes are poor in this case. To see this, look what happens to an actual map of the world (points colored by continent ). Sometimes continents are broken apart, e.g. Africa in this case. 9/
Sometimes sea water is in mixed in with land (look at South America). 10/
No matter what parameters you choose, you'll see some semblance of the continents, but pretty much things are a mess. 11/
The chaos these projections can create is made clearer by omitting the ocean. Look at South American, which in reality is a "cell type" (continent) that is filled uniformly with cells, looking like a differentiation trajectory. 12/
At least in the above, South America is connected to North America. That is not always the case. 13/
Again, you'll find that varying parameters produces maps that, while in some cases better than others, all have major problems. 14/
UMAP author @leland_mcinnes describes it as "capturing the manifold underlying the data" by "stealing the singular set & geometric realization functors from algebraic topology & then adapting them to apply to metric spaces and fuzzy simplicial sets." 15/
umap-learn.readthedocs.io/en/latest/how_…
Well, the sphere is a manifold? What exactly has UMAP captured?

Look, I love algebraic topology but throwing fancy math words around doesn't make a method have good properties. One needs theorems for that. 16/
UMAP is not just randomly placing high-dimensional points in the plane. In benchmarks we've done we see it preserves some structure (). But it's overall a poor heuristic. Ask yourself: next time you fly would you want your pilot navigating with a UMAP? 17/
Biologists have pushed back on criticisms of UMAP by saying that (to paraphrase), "of course they are not used for analysis, they are just hypothesis generating plots and all predictions must be validated". First of all, UMAP is used for analysis: 18/
Second, considering how expen$ive most experiments are in biology, and how much time they take, are graduate students really spend years in a lab chasing a UMAP generated hypothesis to confirm that it is real? 19/
This thread has focused on UMAP, but it also highlights problems with clustering. Here is a Leiden clustering of the continents (from points uniformly sampled within them, displayed with Mercator projection). Not terrible, but is Africa really two continents? 20/
The interaction between UMAP and the clustering makes a reasonably good clustering much worse. That's because it magnifies small differences. In many parameter choices below, blue and yellow like like two separate clusters. There's a "novel" cell type right there! 21/
In addition to all of these problems with single-cell atlantes, is also the problem that they are not "canonical", the way one would like an atlas to be. 22/
What should one produce instead of UMAP atlantes? There are many useful ways to visualize information, even geographic information, that can yield great insight. Turning statistics into art can be challenging, but it's important and useful. No need to be lazy. 23/
This thread was motivated by discussions with @IngileifBryndis, and inspired in part by the beautiful animations of @JEFworks (see ). 24/
The UMAP analyses of this thread, and their visualizations and animations, were produced by @LambdaMoses. Her code used to make the figures is available here: github.com/lambdamoses/um… 25/25
? -> .
(annoying typo, the point is yes, the sphere is a manifold).

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lior Pachter

Lior Pachter Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lpachter

Sep 10
So this plagiarism thing has happened to our lab.. again. This time it's plagiarism of our poseidon syringe pump paper @booeshaghi et al., 2019 in @SciReports:
Text has been plagiarized, as well as figures copied directly here: 1/🧵nature.com/articles/s4159…
ijirset.com/upload/2024/ma…
Here is figure 1 from our paper (LHS) and figure 1 in the plagiarized paper (RHS) published in the "International Journal of Innovative Research" 2/ ijirset.com/upload/2024/ma…

Image
Image
The text seems to have been rewritten with an LLM. Our introduction (LHS) vs. the plagiarized version (RHS): 3/
Image
Image
Read 11 tweets
Aug 16
I've checked this paper out, as instructed. I was also interested in the main result for personal reasons: I'm 51 years old. Is it true that I've just gone through a major change? And that another one awaits me in just a few years?

Some comments on the paper in this thread 1/🧵
The main result about major changes in the mid 40s and 60s is shown in this plot (Fig. 4a). First, I redrew it with axes that start at 0, so the scale of change here was clearer. Not as impressive, but maybe it's a thing? 2/
Image
Image
The authors say that this finding is even corroborated in another study (ref 14). But that's not true. I looked it up, and it shows something totally different (see RHS Fig 3c from ref 14). No change in mid 40s, but a change in the mid 30s, and the real change in the 80s 😕 3/
Image
Image
Read 17 tweets
Aug 10
I recently posted on @bound_to_love's work quantifying long-read RNA-seq. In response, a scientist acting in bad faith (Rob Patro @nomad421) trashed our work. This kind of mold in science's bathroom is extremely damaging so here's a bit of bleach. 1/🧵
At issue are benchmarking results we performed comparing our tool, lr-kallisto, to other programs including Patro's Oarfish. Shortly after we posted our preprint Patro started subtweeting our work, claiming we'd run an "appallingly wrong benchmark" and that we're "bullies". 2/
Image
Image
This was followed, within days, by Patro posting a hastily written preprint disguised as research work on benchmarking, but really just misusing @biorxivpreprint to broadcast the lie that our work "... may be repeatable, but it appears neither replicable nor reproducible." 3/ Image
Read 25 tweets
Aug 1
This recently published figure by @Sarah_E_Ancheta et al. is very disturbing and should lead to some deep introspection in the single-cell genomics community (I doubt it will).

It demonstrates complete disagreement among 5 widely used "RNA velocity" methods 1/ Image
This is of course no surprise. In "RNA velocity unraveled" by @GorinGennady et al. in @PLOSCompBiol we wrote 55 page paper explaining the many ways in which RNA velocity makes no sense. 2/ journals.plos.org/ploscompbiol/a…
We're not the only ones to understand how flawed RNA velocity is. The paper from the groups of @KasperDHansen and @loyalgoff is titled "pumping the brakes on RNA velocity". The whole notion of putting arrows on UMAPs is ridiculous. 3/genomebiology.biomedcentral.com/articles/10.11…
Read 6 tweets
Jul 29
I was recently schooled on some Gen Z and Gen Alpha brainrot. Here's a yap about it:

1/🧵
Delulu:

"I just submitted my R01. I'm sure this one is going to finally get funded!"

You're delulu.

2/
Cap:

"Reviewer #3 accepted our paper without revisions no cap!!"

You're capping be for real! 3/
Read 13 tweets
Jul 25
Challenge accepted. Here are a few comments on the paper after starting to wade through its massive content. The paper in question is 1/🧵 nature.com/articles/s4158…
First, the claim that "lower OPC fraction across regions and, in particular, in non-neocortex regions was significantly associated with impaired cognition (Supplementary Fig. 37d)" is not true. Supp. Fig. 37d is below. I've boxed in red the panel the claim is based on. 2/ Image
The R^2 value, i.e. proportion of variance explained is 0.0256. The "significance" claim is based on the reported p-value of 0.0071 which is less than 0.05. However significance vanishes once one corrects for the number of tests performed. 3/
Read 22 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(