Lior Pachter Profile picture
Apr 5, 2024 25 tweets 10 min read Read on X
The choice of whether to use Seurat or Scanpy for single-cell RNA-seq analysis typically comes down to a preference of R vs. Python. But do they produce the same results? In w/ @Josephmrich et al. we take a close look. The results are 👀 1/🧵 biorxiv.org/content/10.110…
Image
We looked at a standard processing / analysis summarized in the figure below. The sources of variability we explored are in red. The plots and metrics we assessed are in blue. We examined the standard benchmark 10x PBMC datasets, but results can be obtained for other data. 2/ Image
Before getting into results it's important to note that Seurat has never been published, and many of the details of Scanpy are missing in its original paper. @Josephmrich read the code & traced every function and every parameter. E.g., this is how Clustering / UMAPs are made: 3/ Image
There's a lot of talk about kNN graphs, but Seurat uses an SNN graph for clustering, whereas Scanpy uses (a different) SNN graph for both UMAP & clustering. The way kNN graphs are made also differs (and can depend on the number of cells being processed). More on this later. 4/ Image
So let's jump to what people fixate on most: clusters & UMAPs. Starting with the same data and running Seurat and Scanpy with their defaults one gets the results below. Making this plot was non-trivial, as it required a matching algorithm to get the clusters / colors aligned. 5/ Image
Seurat clusters are a more jumbled than the Scanpy ones (look at cluster 3 Seurat vs. 5 Scanpy). This is because Seurat uses different graphs for clustering & UMAP, whereas Scanpy uses the same.Based on this I've learned to tell whether a UMAP was made by Seurat or Scanpy 🙃. 6/
Image
Image
I now joke that if you want to make your PI happy, show them a Scanpy UMAP because the data will look cleaner. But of course this isn't funny. There can be completely different conclusions drawn from the two UMAPs both qualitatively and quantitatively .7/
Image
Image
A basic question is can Seurat and Scanpy be made the same, i.e. leaving aside the question about which is more correct, can parameters be set to get the programs to agree? @josephmrich did a detailed analysis of this. The answer is partly yes but overall no. 8/ Image
Some functions agree with default params. Some can be made to be the same by matching arguments. In some cases (e.g. SNN / UMAP) it's impossible to get them to agree within the current implementations. Guides for how to make Seurat match Scanpy, or vice versa, are in the Supp. 9/
Image
Image
To understand the contribution of differences in each step to the overall divergence of the methods, we examined the output of each step with the exact same input. This is all in the supplement. This was important because the end result (markers) is *very* different. 10/ Image
tl;dr there is a ton of detail that really matters. Differences started to be observed with PCA. They can be resolved (in the case of PCA), but it required really digging into the code to figure out how. Without fixing these differences, the PCAs don't match. 11/
Image
Image
Key differences start to emerge with how Seurat and Scanpy select highly variable genes (HVGs). Seurat’s default HVG algorithm is “vst” (equivalent to Scanpy’s “seurat v3” flavor), while Scanpy’s default HVG algorithm is “seurat” (equivalent to Seurat’s “mean.var.plot”). 👀 12/ Image
It matters what the algorithms are, and they're totally different. Before asking which to use, it's useful to now what they are. Details are in the preprint. E.g. mean.var.plot/seurat fits a loess model to the variance and mean. Vst/seurat_v3 bins based on ranked mean. 13/ Image
Versions also matter. A lot. Seuratv5 has changed how log-fold change is computed from Seuratv4. The difference to results are massive. This change was done to fix an error pointed out in preprints by @jeffreypullin & @davisjmc, and seperately by @LambdaMoses from our group. 14/
Image
Image
But the new fix is still problematic. @josephmrich again looked at the implementation, and there is now a dependence in the pseudocount on cluster size, which is weird. We explain this, in detail, in the preprint. 15/ Image
There are too many other differences between Seurat and Scanpy to summarize here. I'll mention a seemingly minor one with major implications. They handles ties different when computing adjusted p-values. This results in major differences in reported p-values. 16/ Image
Versioning is a major issue not just with Seurat & Scanpy. We also looked at Cell Ranger, which has changed its default for how it counts reads to produce the gene-count matrix. The change has major implications. I recommend sitting down before looking at the plots. 17/
Image
Image
Now some might say "ok, but I don't care..still found our biological result either way". That may be true, but then perhaps one should sequence less, or assay fewer cells. We asked how low one could go, and still have results whose differences is less than Seurat vs. Scanpy. 18/
The answers are below, broken down by procedure. If you don't care about the differences between Seurat and Scanpy, you might as well sequence 5% of the reads, or sacrifice a lot fewer mice and assay less 80% less cells. 19/ Image
This is a key point. Nihilism in terms of software used and an addiction to not understanding (h/t Amos Tanay) is not just poor scholarship, it also leads to wasted (graduate student and postdoc) time, @NIH money, and lives of animals. The #scRNAseq field can do better. 20/
Thanks to @satijalab and @fabian_theis for making their Seurat and Scanpy packages open source. This work could not have been undertaken without that transparency. Our analyses are also open source and reproducible; the code is available at 21/github.com/pachterlab/RME…
This work began from initial investigations into the differences between log-fold-change calculations between Seurat and Scanpy that I looked at with Nicolas Bray, and which we wrote about in the Supplement here: 22/biorxiv.org/content/10.110…
@LambdaMoses also started to investigate differences in PCA, which was continued by the @pmelsted group. On the advice of we decided to go more in-depth and write a separate paper. @Josephmrich took on the task, and the manuscript is his work. 23/twitter.com/GaalBernadett
Aside from the comparisons between Seurat and Scanpy, their different versions, examination of Cell Ranger version differences etc., @Josephmrich's detailed description of Seurat and Scanpy's methods and associated parameters should be useful documentation for others. 24/
Finally, this work was truly a lab effort. Kayla Jackson, @NeuroLuebbert, @sinabooeshaghi, and @DelaneyKSull all had numerous and useful insights after slowly developing worries about Seurat vs. Scanpy over the years. I'll conclude with #methodsmatter 25/25

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lior Pachter

Lior Pachter Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lpachter

Dec 22, 2025
Exhibit 290 in the Epstein files raises some questions about whether he was truly a "source of intellectual exchange and stimulation."

The document is here:

Thread on what is in this document below: 1/🧵justice.gov/multimedia/Cou…
Epstein the racist:

"they [top mathematicians] don't exist in China because you need a bit of a creative person as opposed to simply a copy cat."

Also this take is not only racist but also so so so stupid. Hua Luogeng literally rolling in his grave.

2/🧵 Image
Epstein's embarrassing advice:

"Algebra is not as important as it used to be. Programming is important."

This has aged about as well as milk.

3/🧵 Image
Read 13 tweets
Jul 1, 2025
This week in academia, a not so short🧵...

1. Staff reductions and other cost cutting measures coming to Brown University highereddive.com/news/brown-uni…
2. Michigan State University staff cuts lansingstatejournal.com/story/news/loc…
3. Indiana University eliminating or suspending academic programs
bloomingtonian.com/2025/06/30/ind…
Read 17 tweets
Sep 10, 2024
So this plagiarism thing has happened to our lab.. again. This time it's plagiarism of our poseidon syringe pump paper @booeshaghi et al., 2019 in @SciReports:
Text has been plagiarized, as well as figures copied directly here: 1/🧵nature.com/articles/s4159…
ijirset.com/upload/2024/ma…
Here is figure 1 from our paper (LHS) and figure 1 in the plagiarized paper (RHS) published in the "International Journal of Innovative Research" 2/ ijirset.com/upload/2024/ma…

Image
Image
The text seems to have been rewritten with an LLM. Our introduction (LHS) vs. the plagiarized version (RHS): 3/
Image
Image
Read 11 tweets
Aug 16, 2024
I've checked this paper out, as instructed. I was also interested in the main result for personal reasons: I'm 51 years old. Is it true that I've just gone through a major change? And that another one awaits me in just a few years?

Some comments on the paper in this thread 1/🧵
The main result about major changes in the mid 40s and 60s is shown in this plot (Fig. 4a). First, I redrew it with axes that start at 0, so the scale of change here was clearer. Not as impressive, but maybe it's a thing? 2/
Image
Image
The authors say that this finding is even corroborated in another study (ref 14). But that's not true. I looked it up, and it shows something totally different (see RHS Fig 3c from ref 14). No change in mid 40s, but a change in the mid 30s, and the real change in the 80s 😕 3/
Image
Image
Read 17 tweets
Aug 10, 2024
I recently posted on @bound_to_love's work quantifying long-read RNA-seq. In response, a scientist acting in bad faith (Rob Patro @nomad421) trashed our work. This kind of mold in science's bathroom is extremely damaging so here's a bit of bleach. 1/🧵
At issue are benchmarking results we performed comparing our tool, lr-kallisto, to other programs including Patro's Oarfish. Shortly after we posted our preprint Patro started subtweeting our work, claiming we'd run an "appallingly wrong benchmark" and that we're "bullies". 2/
Image
Image
This was followed, within days, by Patro posting a hastily written preprint disguised as research work on benchmarking, but really just misusing @biorxivpreprint to broadcast the lie that our work "... may be repeatable, but it appears neither replicable nor reproducible." 3/ Image
Read 25 tweets
Aug 1, 2024
This recently published figure by @Sarah_E_Ancheta et al. is very disturbing and should lead to some deep introspection in the single-cell genomics community (I doubt it will).

It demonstrates complete disagreement among 5 widely used "RNA velocity" methods 1/ Image
This is of course no surprise. In "RNA velocity unraveled" by @GorinGennady et al. in @PLOSCompBiol we wrote 55 page paper explaining the many ways in which RNA velocity makes no sense. 2/ journals.plos.org/ploscompbiol/a…
We're not the only ones to understand how flawed RNA velocity is. The paper from the groups of @KasperDHansen and @loyalgoff is titled "pumping the brakes on RNA velocity". The whole notion of putting arrows on UMAPs is ridiculous. 3/genomebiology.biomedcentral.com/articles/10.11…
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(