Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Lior Pachter

@lpachter

Jan 22, 2022 • 26 tweets • 11 min read • Read on X

Scrolly

@NeuroLuebbert

Is a single-cell RNA-seq atlas really an atlas? A short thread about #scRNAseq, maps, and atlantes (yes, the plural of atlas is atlantes! h/t @NeuroLuebbert). 🧵1/

Atlantes must be accurate to be useful, and the vexing question for centuries, namely how to best represent the spherical earth in 2D, is nontrivial. There have been many proposals with pros & cons for each (because the sphere and the plane have different Gaussian curvatures). 2/

In #scRNAseq, atlases of cells have become synonyms with UMAP figures of gene expression matrices (used to be t-SNE but UMAP seems more popular now). Map making from gene expression matrices is more challenging than map making of our 3D world; #scRNAseq is in ~10⁴ dimensions. 3/

Mathematician George Pólya gave the following advice: "If you can't solve a problem, then there is an easier problem you can't solve: find it." This has been ignored in #scRNAseq, which wouldn't matter, except the method used for the general case fails on the simplest one.4/

Below is an example from a simple case. It's UMAP of a group of cells that are not in some huge dimension; here there are only 3 genes. The data was clustered with the popular "Leiden" method. The figure *seems* ok with the visual more or less confirming the clustering. 5/

But what was the actual example, the "ground truth" that this "atlas" represents? These were points selected uniformly at random on the sphere. No actual structure whatsoever. You can see how the UMAPs look for varying parameters: 6/

The Leiden clustering was performed on the uniformly sampled points. Of course the clusters consists of points that were close together, but their boundaries and shapes are meaningless... the points were sampled (densely) uniformly at random... 7/

How do people currently select parameters for the UMAPs they make? They tune with them until they get a picture that matches the clustering well...#confirmationbias 8/

You might wonder whether *any* of the choices of parameters produce a good map. All the atlantes are poor in this case. To see this, look what happens to an actual map of the world (points colored by continent ). Sometimes continents are broken apart, e.g. Africa in this case. 9/

Sometimes sea water is in mixed in with land (look at South America). 10/

No matter what parameters you choose, you'll see some semblance of the continents, but pretty much things are a mess. 11/

The chaos these projections can create is made clearer by omitting the ocean. Look at South American, which in reality is a "cell type" (continent) that is filled uniformly with cells, looking like a differentiation trajectory. 12/

At least in the above, South America is connected to North America. That is not always the case. 13/

Again, you'll find that varying parameters produces maps that, while in some cases better than others, all have major problems. 14/

@leland_mcinnes

UMAP author @leland_mcinnes describes it as "capturing the manifold underlying the data" by "stealing the singular set & geometric realization functors from algebraic topology & then adapting them to apply to metric spaces and fuzzy simplicial sets." 15/
umap-learn.readthedocs.io/en/latest/how_…

Well, the sphere is a manifold? What exactly has UMAP captured?

Look, I love algebraic topology but throwing fancy math words around doesn't make a method have good properties. One needs theorems for that. 16/

https://twitter.com/lpachter/status/1431325969411821572?lang=en

UMAP is not just randomly placing high-dimensional points in the plane. In benchmarks we've done we see it preserves some structure (

https://twitter.com/lpachter/status/1431325969411821572?lang=en

). But it's overall a poor heuristic. Ask yourself: next time you fly would you want your pilot navigating with a UMAP? 17/

https://twitter.com/GorinGennady/status/1431720888009826306?s=20

Biologists have pushed back on criticisms of UMAP by saying that (to paraphrase), "of course they are not used for analysis, they are just hypothesis generating plots and all predictions must be validated". First of all, UMAP is used for analysis:

https://twitter.com/GorinGennady/status/1431720888009826306?s=20

18/

Second, considering how expen$ive most experiments are in biology, and how much time they take, are graduate students really spend years in a lab chasing a UMAP generated hypothesis to confirm that it is real? 19/

This thread has focused on UMAP, but it also highlights problems with clustering. Here is a Leiden clustering of the continents (from points uniformly sampled within them, displayed with Mercator projection). Not terrible, but is Africa really two continents? 20/

The interaction between UMAP and the clustering makes a reasonably good clustering much worse. That's because it magnifies small differences. In many parameter choices below, blue and yellow like like two separate clusters. There's a "novel" cell type right there! 21/

https://twitter.com/lpachter/status/1431325992711180289?s=20

In addition to all of these problems with single-cell atlantes, is also the problem that they are not "canonical", the way one would like an atlas to be.

https://twitter.com/lpachter/status/1431325992711180289?s=20

22/

What should one produce instead of UMAP atlantes? There are many useful ways to visualize information, even geographic information, that can yield great insight. Turning statistics into art can be challenging, but it's important and useful. No need to be lazy. 23/

@IngileifBryndis

This thread was motivated by discussions with @IngileifBryndis, and inspired in part by the beautiful animations of @JEFworks (see

https://twitter.com/JEFworks/status/1483906029847220224?s=20

). 24/

@LambdaMoses

The UMAP analyses of this thread, and their visualizations and animations, were produced by @LambdaMoses. Her code used to make the figures is available here: github.com/lambdamoses/um… 25/25

? -> .
(annoying typo, the point is yes, the sphere is a manifold).

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lpachter

Lior Pachter

@lpachter

Dec 22, 2025

Exhibit 290 in the Epstein files raises some questions about whether he was truly a "source of intellectual exchange and stimulation."

The document is here:

Thread on what is in this document below: 1/🧵justice.gov/multimedia/Cou…

Epstein the racist:

"they [top mathematicians] don't exist in China because you need a bit of a creative person as opposed to simply a copy cat."

Also this take is not only racist but also so so so stupid. Hua Luogeng literally rolling in his grave.

2/🧵

Epstein's embarrassing advice:

"Algebra is not as important as it used to be. Programming is important."

This has aged about as well as milk.

3/🧵

Read 13 tweets

Lior Pachter

@lpachter

Jul 1, 2025

This week in academia, a not so short🧵...

1. Staff reductions and other cost cutting measures coming to Brown University highereddive.com/news/brown-uni…

2. Michigan State University staff cuts lansingstatejournal.com/story/news/loc…

3. Indiana University eliminating or suspending academic programs
bloomingtonian.com/2025/06/30/ind…

Read 17 tweets

Lior Pachter

@lpachter

Sep 10, 2024

So this plagiarism thing has happened to our lab.. again. This time it's plagiarism of our poseidon syringe pump paper @booeshaghi et al., 2019 in @SciReports:
Text has been plagiarized, as well as figures copied directly here: 1/🧵nature.com/articles/s4159…
ijirset.com/upload/2024/ma…

Here is figure 1 from our paper (LHS) and figure 1 in the plagiarized paper (RHS) published in the "International Journal of Innovative Research" 2/ ijirset.com/upload/2024/ma…

The text seems to have been rewritten with an LLM. Our introduction (LHS) vs. the plagiarized version (RHS): 3/

Read 11 tweets

Lior Pachter

@lpachter

Aug 16, 2024

https://twitter.com/SnyderShot/status/1823814971761025451

I've checked this paper out, as instructed. I was also interested in the main result for personal reasons: I'm 51 years old. Is it true that I've just gone through a major change? And that another one awaits me in just a few years?

Some comments on the paper in this thread 1/🧵

https://twitter.com/SnyderShot/status/1823814971761025451

The main result about major changes in the mid 40s and 60s is shown in this plot (Fig. 4a). First, I redrew it with axes that start at 0, so the scale of change here was clearer. Not as impressive, but maybe it's a thing? 2/

The authors say that this finding is even corroborated in another study (ref 14). But that's not true. I looked it up, and it shows something totally different (see RHS Fig 3c from ref 14). No change in mid 40s, but a change in the mid 30s, and the real change in the 80s 😕 3/

Read 17 tweets

Lior Pachter

@lpachter

Aug 10, 2024

https://x.com/lpachter/status/1814716374847197354

I recently posted on @bound_to_love's work quantifying long-read RNA-seq. In response, a scientist acting in bad faith (Rob Patro @nomad421) trashed our work. This kind of mold in science's bathroom is extremely damaging so here's a bit of bleach. 1/🧵

https://x.com/lpachter/status/1814716374847197354

At issue are benchmarking results we performed comparing our tool, lr-kallisto, to other programs including Patro's Oarfish. Shortly after we posted our preprint Patro started subtweeting our work, claiming we'd run an "appallingly wrong benchmark" and that we're "bullies". 2/

This was followed, within days, by Patro posting a hastily written preprint disguised as research work on benchmarking, but really just misusing @biorxivpreprint to broadcast the lie that our work "... may be repeatable, but it appears neither replicable nor reproducible." 3/

Read 25 tweets

Lior Pachter

@lpachter

Aug 1, 2024

This recently published figure by @Sarah_E_Ancheta et al. is very disturbing and should lead to some deep introspection in the single-cell genomics community (I doubt it will).

It demonstrates complete disagreement among 5 widely used "RNA velocity" methods 1/

This is of course no surprise. In "RNA velocity unraveled" by @GorinGennady et al. in @PLOSCompBiol we wrote 55 page paper explaining the many ways in which RNA velocity makes no sense. 2/ journals.plos.org/ploscompbiol/a…

We're not the only ones to understand how flawed RNA velocity is. The paper from the groups of @KasperDHansen and @loyalgoff is titled "pumping the brakes on RNA velocity". The whole notion of putting arrows on UMAPs is ridiculous. 3/genomebiology.biomedcentral.com/articles/10.11…

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Lior Pachter

Try unrolling a thread yourself!

More from @lpachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!