First, note that single-cell RNA-seq data provides some quantification not only of processed (mature) messenger RNAs, but also of nascent molecules. That observation by @GioeleLaManno, @slinnarsson and colleagues underlies RNA velocity. But what about single-nucleus RNA-seq? 2/
Similarly to #scRNAseq, #snRNAseq data has reads derived from both nascent and mature transcripts. However, perhaps due to intuition that #snRNAseq is all nascent, current approach to quantification yield a single count matrix based on all reads mapping to every gene locus. 3/
Instead of directly quantifying nascent molecules, this approach provides a quantification that mixes both nascent and mature molecules, and in doing so is conceptually inconsistent with current approaches to quantifying #scRNAseq 4/
To put quantification of both #scRNAseq and #snRNAseq on a level playing field, we first developed an approach to quantifying both nascent and mature molecules using what we call a D-list. 5/
The D-list consists of k-mers that are "distinguishing flanking k-mers" (DFKs), that prevent nascent molecules to be confused for mature, and vice versa. The D-list is provides the needed link between ambiguity in k-mers and ambiugity in reads for pseudoalignment purposes. 6/
The D-list can provably exclude (under a mild assumption) nascent/mature ambiguous reads from being erroneously mapped by making use of DFK. We validated this claim by checking the mapping of reads without errors. 7/
Of course sequenced reads contain errors, so to get a better handle on performance in realistic scenarios we examined a simulation. The simulation framework was identical to that of STARsolo from the preprint for that method, down to the evaluation criteria. 8/
kallisto bustools performed very well. The plots below are results from two simulations (with [right] and without [left] multi-mapping reads). 9/
The evaluation itself is flawed; we stuck with the STARsolo method so that there would be no questions about us prettying up our own results. However the assessment precludes true negatives. If we add those in our performance is even better. 10/
Validation and simulation are one thing, but what does all of this translate to on analysis of biological data? 1. Including the D-list to eliminate false positive mappings doesn't change results much (as discussed previously w/ @sinabooeshaghi here: biorxiv.org/content/10.110…) 11/
2. The current practice of agglomerating all reads (nascent and mature) into one count matrix doesn't make much sense. The counts are quite different than if one restricts, for instance, to nascent transcripts (as done by kallisto in the analysis below). 12/
By the way, quantifying single-nucleus RNA-seq only with respect to mature transcripts is also problematic as shown here: biorxiv.org/content/10.110… 13/
An update to the preprint will be posted shortly with some further analyses, and several other new applications of kallisto bustools are forthcoming, including to other assays building off of
Finally, a question I'm asked a lot is which of the (nascent or mature) #scRNAseq/#snRNAseq matrices should be used for analysis. This short answer is they should be used together for fitting models as in
The "genius" @elonmusk was a year ahead of me @PtaBoysHigh. Every year they gave awards to those "worthy of praise" (digni laude). Perhaps it's not a big surprise that Musk was never deemed "worthy of praise".
Which is fine, of course. Awards like this are rubbish anyway. But friendships aren't, and the people I miss from school, and there are many... are not Musk (in this photo he is front row left).
Let's start with the article he links to. It begins w/ "Women are now 60 percent of college graduates, men a mere 40 percent" describing this as "the fading male presence". Bollocks. The percentage of males who have completed four years of college has soared: 6% to 37% in 80 yrs.
How does that relate to the 60-40 differential between current graduates? Misogynists try to scare with truncated axes as shown below. They will say that the plot above is cumulative over time, i.e. it looks at the whole population, not graduates today. Well, let's take a look..
Tons of exciting new single-cell genomics tools have been showcased at #bioc2022 this week. Today @LambdaMoses presented SpatialFeatureExperiment, an S4 class extending SpatialExperiment, facilitating geospatial stats for spatial #scRNAseq using Voyager github.com/pachterlab/Voy… 1/
The design of SpatialFeatureExperiment and the plans for Voyager were formed from a careful study that @LambdaMoses conducted of the spatial transcriptomics field (published as the "Museum of Spatial Transcriptomics"): nature.com/articles/s4159… 2/
While there are several analysis tools for spatial transcriptomics data, and extensions of #scRNAseq platforms such as Seurat for spatial data, they have limitations in terms of the methods they implement from the field of geospatial statistics. 3/
The exciting reveal of Ultima Genomics last week was accompanied by the publication of four preprints. Intrigued by the potential of the technology, @sinabooeshaghi & I decided to take a look at the data. A 🧵 about our findings & a preprint we posted: biorxiv.org/content/10.110… 1/
Unfortunately, no data. No code. There is not even supplementary material, which the authors write "will be made available in the near future." 2/
Without data or code, obviously one cannot check the claims of the company. But in this case one cannot even understand the claims. E.g. the description for Fig. 2e in the Methods is useless without code to explain what was actually done to produce it. 3/