Tweet

Lior Pachter

Dec 4 • 16 tweets • 10 min read

@kreldjarn

In a new preprint w/@kreldjarn, @DelaneyKSull, @GuillaumOleSan & @pmelsted we address a shortcoming in current approaches to quantifying single-nucleus RNA-seq: biorxiv.org/content/10.110…
tl;dr care has to be taken in quantifying nascent vs. mature transcripts. #snRNAseq #scRNAseq🧵

@GioeleLaManno

First, note that single-cell RNA-seq data provides some quantification not only of processed (mature) messenger RNAs, but also of nascent molecules. That observation by @GioeleLaManno, @slinnarsson and colleagues underlies RNA velocity. But what about single-nucleus RNA-seq? 2/

Similarly to #scRNAseq, #snRNAseq data has reads derived from both nascent and mature transcripts. However, perhaps due to intuition that #snRNAseq is all nascent, current approach to quantification yield a single count matrix based on all reads mapping to every gene locus. 3/

Instead of directly quantifying nascent molecules, this approach provides a quantification that mixes both nascent and mature molecules, and in doing so is conceptually inconsistent with current approaches to quantifying #scRNAseq 4/

To put quantification of both #scRNAseq and #snRNAseq on a level playing field, we first developed an approach to quantifying both nascent and mature molecules using what we call a D-list. 5/

The D-list consists of k-mers that are "distinguishing flanking k-mers" (DFKs), that prevent nascent molecules to be confused for mature, and vice versa. The D-list is provides the needed link between ambiguity in k-mers and ambiugity in reads for pseudoalignment purposes. 6/

The D-list can provably exclude (under a mild assumption) nascent/mature ambiguous reads from being erroneously mapped by making use of DFK. We validated this claim by checking the mapping of reads without errors. 7/

Of course sequenced reads contain errors, so to get a better handle on performance in realistic scenarios we examined a simulation. The simulation framework was identical to that of STARsolo from the preprint for that method, down to the evaluation criteria. 8/

kallisto bustools performed very well. The plots below are results from two simulations (with [right] and without [left] multi-mapping reads). 9/

The evaluation itself is flawed; we stuck with the STARsolo method so that there would be no questions about us prettying up our own results. However the assessment precludes true negatives. If we add those in our performance is even better. 10/

@sinabooeshaghi

Validation and simulation are one thing, but what does all of this translate to on analysis of biological data?
1. Including the D-list to eliminate false positive mappings doesn't change results much (as discussed previously w/ @sinabooeshaghi here: biorxiv.org/content/10.110…) 11/

2. The current practice of agglomerating all reads (nascent and mature) into one count matrix doesn't make much sense. The counts are quite different than if one restricts, for instance, to nascent transcripts (as done by kallisto in the analysis below). 12/

By the way, quantifying single-nucleus RNA-seq only with respect to mature transcripts is also problematic as shown here: biorxiv.org/content/10.110… 13/

@kreldjarn

Thanks to numerous memory efficiencies implemented in kallisto by @kreldjarn and @DelaneyKSull using the Bifrost index genomebiology.biomedcentral.com/articles/10.11… (with help from @GuillaumOleSan), kallisto's speed is amazing and memory needs extra lean making possible working on @GoogleColab. 14/

An update to the preprint will be posted shortly with some further analyses, and several other new applications of kallisto bustools are forthcoming, including to other assays building off of

https://twitter.com/sinabooeshaghi/status/1579861159431184384?s=20&t=eh1vdGws-VaGj9erkysEwQ

. 15/

Finally, a question I'm asked a lot is which of the (nascent or mature) #scRNAseq/#snRNAseq matrices should be used for analysis. This short answer is they should be used together for fitting models as in

https://twitter.com/lpachter/status/1536382682577240065?s=20&t=-JHVT7QvU3-ZWuvme46i2Q

. More on this w/ @GorinGennady soon. 16/16

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lpachter

Lior Pachter

@lpachter

Nov 18

@elonmusk

The "genius" @elonmusk was a year ahead of me @PtaBoysHigh. Every year they gave awards to those "worthy of praise" (digni laude). Perhaps it's not a big surprise that Musk was never deemed "worthy of praise".

Which is fine, of course. Awards like this are rubbish anyway. But friendships aren't, and the people I miss from school, and there are many... are not Musk (in this photo he is front row left).

https://twitter.com/lpachter/status/1593468701373710336?s=20&t=jSH5bLkxKmBqcJg0tmO35w

@elonmusk

Nowadays @elonmusk likes to cultivate his genius myth... here his mom gives him an assist. businesstoday.in/latest/trends/…

Read 5 tweets

Lior Pachter

@lpachter

Nov 11

Why are they selling poppies, Mummy?
Selling poppies in town today.
The poppies, child, are flowers of love.
For the men who marched away.

But why have they chosen a poppy, Mummy?
Why not a beautiful rose?
Because my child, men fought and died
In the fields where the poppies grow.

But why are the poppies so red, Mummy?
Why are the poppies so red?
Red is the colour of blood, my child.
The blood that our soldiers shed.

Read 6 tweets

Lior Pachter

@lpachter

Oct 29

@sapinker

This is rubbish. Shame on @sapinker for spreading this misogyny. A 🧵...

https://twitter.com/sapinker/status/1586042240824053760

Let's start with the article he links to. It begins w/ "Women are now 60 percent of college graduates, men a mere 40 percent" describing this as "the fading male presence". Bollocks. The percentage of males who have completed four years of college has soared: 6% to 37% in 80 yrs.

How does that relate to the 60-40 differential between current graduates? Misogynists try to scare with truncated axes as shown below. They will say that the plot above is cumulative over time, i.e. it looks at the whole population, not graduates today. Well, let's take a look..

Read 19 tweets

Lior Pachter

@lpachter

Jul 28

@LambdaMoses

Tons of exciting new single-cell genomics tools have been showcased at #bioc2022 this week. Today @LambdaMoses presented SpatialFeatureExperiment, an S4 class extending SpatialExperiment, facilitating geospatial stats for spatial #scRNAseq using Voyager github.com/pachterlab/Voy… 1/

@LambdaMoses

The design of SpatialFeatureExperiment and the plans for Voyager were formed from a careful study that @LambdaMoses conducted of the spatial transcriptomics field (published as the "Museum of Spatial Transcriptomics"): nature.com/articles/s4159… 2/

While there are several analysis tools for spatial transcriptomics data, and extensions of #scRNAseq platforms such as Seurat for spatial data, they have limitations in terms of the methods they implement from the field of geospatial statistics. 3/

Read 6 tweets

Lior Pachter

@lpachter

Jun 6

@sinabooeshaghi

The exciting reveal of Ultima Genomics last week was accompanied by the publication of four preprints. Intrigued by the potential of the technology, @sinabooeshaghi & I decided to take a look at the data. A 🧵 about our findings & a preprint we posted: biorxiv.org/content/10.110… 1/

We first looked at the company's own preprint on which the CEO is first author: biorxiv.org/content/10.110…

Unfortunately, no data. No code. There is not even supplementary material, which the authors write "will be made available in the near future." 2/

Without data or code, obviously one cannot check the claims of the company. But in this case one cannot even understand the claims. E.g. the description for Fig. 2e in the Methods is useless without code to explain what was actually done to produce it. 3/

Read 25 tweets

Lior Pachter

@lpachter

May 19

@ensembl

Analysis of #scRNAseq requires constant, tedious, interaction with genomics databases. To facilitate querying from @ensembl et al., @NeuroLuebbert developed gget:
biorxiv.org/content/10.110… (code @ github.com/pachterlab/gget).
gget has many uses; a 🧵on the its amazing versatility: 1/

https://twitter.com/ensembl/status/1149633319933374464

gget works from the command line or python. Just `pip install gget`.

Need reference files for your analysis? 2/

https://twitter.com/ensembl/status/1149633319933374464

Simple with `gget ref`...3/

Read 25 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Lior Pachter

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @lpachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!