In a new preprint, @sinabooeshaghi et al. present deep SMART-Seq, @10xGenomics and MERFISH #scRNAseq (37,925,526,323 reads, 344,256 cells) from the mouse primary motor cortex, demonstrating the benefits of cross-platform isoform-level analysis. biorxiv.org/content/10.110… 1/15
We produce an isoform atlas and identify isoform markers for classes, subclasses and clusters of cells across all layers of the primary motor cortex. 2/15
Isoform-level results are facilitated by kallisto isoform-level quantification of the SMART-seq data. We show that such EM-based isoform quantification is essential not just for isoform but for gene-level results. #methodsmatter 3/15
Using the 10xv3 data, we also present the first cross-platform validation of SMART-seq isoform quantification (possible by analyzing transcripts with unique sequence near the 3’ end). 4/15
A key result is that many cell types have strong isoform markers that cannot be detected at the gene level. See the example below: 5/15
One exciting application of isoform quantification is spatial extrapolation. We show that in some cases isoform resolution can be achieved spatially even with gene probes. This is a powerful way to leverage SMART-seq for MERFISH and SEQFISH. 6/15
These results argue for a rethinking of current #scRNAseq best practices. We find that SMART-seq complements droplet-based methods and spatial RNA-seq, adding layers of important isoform resolution to cell atlases. 7/15
We recommend droplet-based high-throughput methods for cell type identification, SMART-seq for isoform resolution, and spatial RNA-seq for location information. The whole is great than the sum of the parts. 8/15
The beautiful t-SNE above is a lot cleaner than is usually the case. This is thanks to an idea of @sinabooeshaghi who made it w/ t-SNE of neighborhood component analysis (NCA) dimensionality reduction (@geoffreyhinton, Roweis et al. cs.toronto.edu/~hinton/absps/…) rather than PCA. 9/15
NCA finds a projection that maximizes a stochastic variant of the leave-one-out kNN score given cluster assignments. Intuitively, it projects so as to keep cells from the same cluster near each other, which is exactly what we want. 10/15
You might worry that NCA overfits. It doesn't. We ran a permutation test to confirm that we are seeing real structure in the data. 11/15
As an aside, we found that the t-SNE of the NCA projection preserved global structure better than t-SNE of the PCA projection (e.g., in terms of inhibitory / excitatory neuron classes). It seems that people have been confounding the performance of t-SNE (& UMAP) with PCA. 12/15
All this based on amazing publicly available data from the BICCN, and this is just a preview of the whole mouse brain which is on its way. 14/15
All of this is a result of incredible work by @sinabooeshaghi who did all the analysis single-handedly. Follow him for more interesting #scRNAseq in the near future. 15/15
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The mantra that spatial transcriptomics is about location, location, location is catchy, but what does it really mean? We have just posted , work of Kayla Jackson et al., that describes the concordex method for identifying spatial homogeneous regions. 1/🧵biorxiv.org/content/10.110…
For years there's been a notion that spatial transcriptomics should allow for the definition of "spatial regions" or "tissue domains". In quotes because they're typically defined as "whatever the algorithm outputs". E.g. GASTON "spatial domains" vs. BANKSY "tissue domains". 2/
It's easy to go in circles. Should cell types be distinguished on the basis of spatial data? Or are cell types purely transcriptomic notions without regard to space? Do spatial regions depend on cell types? Or should they be identified together? We went in circles for a while. 3/
Aristotle was the first to notice honeybees dancing. In 1927 Karl von Frisch decoded the waggle. How it works was "explained" by MV Srinivasan AM FRS in the 1990s. Except @NeuroLuebbert found his papers are junk. A 🧵 about her discovery & our report: 1/arxiv.org/abs/2405.12998
First, if you're not familiar with the waggle, it's Nature magic! Watch this video for cool footage and an introduction.
Aristotle's observations in Historia animalium IX are arguably one of the first instances of observation driven inquiry and science. 2/
Karl von Frisch decoded the waggle, meaning he figured out how the number of waggles, and their direction, communicate information about the distance and direction of food sources. von Frisch won the Nobel Prize for his discovery. But exactly how it works remained a mystery. 3/
A lot of bioinformatics requires editing sequencing reads to facilitate QC and make them suitable for processing. To help with such tasks, @DelaneyKSull developed splitcode, now published at 1/ academic.oup.com/bioinformatics…
The input to splitcode are reads in FASTQ, along with a config file. The output can include edited reads or extracted subsequences, in FASTQ (including gzipped), BAM, or interleaved sequences to stdout. Regions can be identified using absolute location or relative anchors. 2/
The splitcode toolkit was motivated by our need for a versatile tool that can perform a range of tasks from adapter trimming to barcode extraction. Specialized tools exist for many tasks, e.g. fastp, UMI-tools,, etc. Splitcode is more general enabling a lot with one tool. 3/
It's been great to see the positive response of @satijalab & @fabian_theis to our preprint on Seurat & Scanpy, and their commitment to work to improve transparency of their tools. One immediate benefit will be better practice of PCA in genomics. 1/🧵biorxiv.org/content/10.110…
PCA became a mainstay in genomics after the papers of @soumya_boston, Josh Stuart & @Rbaltman () and @OrlyAlter () ca. 2000 demonstrated its power for studying gene expression. 2/worldscientific.com/doi/abs/10.114… pnas.org/doi/10.1073/pn…
Back then, having linear algebra on one's side was essential. A rich lab at that time might have something like a Sun Blade workstation clocking ~500MhZ w/ 2Gb RAM. So having fast SVD algorithms made PCA practical, when other methods based on more sophisticated models weren't. 3/
The difference in @10xGenomics' Cell Ranger's default between version 6 and 7 is discussed in this thread, but it's such a big deal that it's worth its own thread.
tl;dr: in v7 Cell Ranger changed how it produces the gene count matrix leading to a huge difference in results. 1/
The change was described in release notes on May 17, 2022, which via two clicks lead to a technical note with more detail: 2/ cdn.10xgenomics.com/image/upload/v…
To understand this technical note it is helpful to be familiar with the three types of reads that are produced in single-cell RNA-seq: spliced (M as a proxy for mature mRNAs), unspliced (N as a proxy for nascent RNAs), and ambiguous between both (labeled A). 3/