Lior Pachter Profile picture
Mar 10, 2020 15 tweets 7 min read Read on X
In a new preprint, @sinabooeshaghi et al. present deep SMART-Seq, @10xGenomics and MERFISH #scRNAseq (37,925,526,323 reads, 344,256 cells) from the mouse primary motor cortex, demonstrating the benefits of cross-platform isoform-level analysis. biorxiv.org/content/10.110… 1/15 Image
We produce an isoform atlas and identify isoform markers for classes, subclasses and clusters of cells across all layers of the primary motor cortex. 2/15 ImageImage
Isoform-level results are facilitated by kallisto isoform-level quantification of the SMART-seq data. We show that such EM-based isoform quantification is essential not just for isoform but for gene-level results. #methodsmatter 3/15 Image
Using the 10xv3 data, we also present the first cross-platform validation of SMART-seq isoform quantification (possible by analyzing transcripts with unique sequence near the 3’ end). 4/15 Image
A key result is that many cell types have strong isoform markers that cannot be detected at the gene level. See the example below: 5/15 Image
One exciting application of isoform quantification is spatial extrapolation. We show that in some cases isoform resolution can be achieved spatially even with gene probes. This is a powerful way to leverage SMART-seq for MERFISH and SEQFISH. 6/15 Image
These results argue for a rethinking of current #scRNAseq best practices. We find that SMART-seq complements droplet-based methods and spatial RNA-seq, adding layers of important isoform resolution to cell atlases. 7/15
We recommend droplet-based high-throughput methods for cell type identification, SMART-seq for isoform resolution, and spatial RNA-seq for location information. The whole is great than the sum of the parts. 8/15 Image
The beautiful t-SNE above is a lot cleaner than is usually the case. This is thanks to an idea of @sinabooeshaghi who made it w/ t-SNE of neighborhood component analysis (NCA) dimensionality reduction (@geoffreyhinton, Roweis et al. cs.toronto.edu/~hinton/absps/…) rather than PCA. 9/15
NCA finds a projection that maximizes a stochastic variant of the leave-one-out kNN score given cluster assignments. Intuitively, it projects so as to keep cells from the same cluster near each other, which is exactly what we want. 10/15
You might worry that NCA overfits. It doesn't. We ran a permutation test to confirm that we are seeing real structure in the data. 11/15 Image
As an aside, we found that the t-SNE of the NCA projection preserved global structure better than t-SNE of the PCA projection (e.g., in terms of inhibitory / excitatory neuron classes). It seems that people have been confounding the performance of t-SNE (& UMAP) with PCA. 12/15
The preprint has code associated to every figure, with links directly to Jupyter notebooks hosted on @github (github.com/pachterlab/BYV…). #reproducibility #usability 13/15
All this based on amazing publicly available data from the BICCN, and this is just a preview of the whole mouse brain which is on its way. 14/15 Image
All of this is a result of incredible work by @sinabooeshaghi who did all the analysis single-handedly. Follow him for more interesting #scRNAseq in the near future. 15/15 Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lior Pachter

Lior Pachter Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lpachter

Jul 20
The mantra that spatial transcriptomics is about location, location, location is catchy, but what does it really mean? We have just posted , work of Kayla Jackson et al., that describes the concordex method for identifying spatial homogeneous regions. 1/🧵biorxiv.org/content/10.110…
For years there's been a notion that spatial transcriptomics should allow for the definition of "spatial regions" or "tissue domains". In quotes because they're typically defined as "whatever the algorithm outputs". E.g. GASTON "spatial domains" vs. BANKSY "tissue domains". 2/
Image
Image
It's easy to go in circles. Should cell types be distinguished on the basis of spatial data? Or are cell types purely transcriptomic notions without regard to space? Do spatial regions depend on cell types? Or should they be identified together? We went in circles for a while. 3/ Image
Read 19 tweets
Jul 2
Aristotle was the first to notice honeybees dancing. In 1927 Karl von Frisch decoded the waggle. How it works was "explained" by MV Srinivasan AM FRS in the 1990s. Except @NeuroLuebbert found his papers are junk. A 🧵 about her discovery & our report: 1/arxiv.org/abs/2405.12998
First, if you're not familiar with the waggle, it's Nature magic! Watch this video for cool footage and an introduction.
Aristotle's observations in Historia animalium IX are arguably one of the first instances of observation driven inquiry and science. 2/
Karl von Frisch decoded the waggle, meaning he figured out how the number of waggles, and their direction, communicate information about the distance and direction of food sources. von Frisch won the Nobel Prize for his discovery. But exactly how it works remained a mystery. 3/ Image
Read 24 tweets
Jun 27
A lot of bioinformatics requires editing sequencing reads to facilitate QC and make them suitable for processing. To help with such tasks, @DelaneyKSull developed splitcode, now published at 1/ academic.oup.com/bioinformatics…
Image
The input to splitcode are reads in FASTQ, along with a config file. The output can include edited reads or extracted subsequences, in FASTQ (including gzipped), BAM, or interleaved sequences to stdout. Regions can be identified using absolute location or relative anchors. 2/ Image
The splitcode toolkit was motivated by our need for a versatile tool that can perform a range of tasks from adapter trimming to barcode extraction. Specialized tools exist for many tasks, e.g. fastp, UMI-tools,, etc. Splitcode is more general enabling a lot with one tool. 3/
Image
Image
Read 12 tweets
May 6
For the second day of the week of observance of the Days of Remembrance of the Victims of the Holocaust a 🧵 about Sosúa.

Sosúa is a small beach town in the Dominican Republic that was founded by Jews fleeing Nazis in Europe in 1940. 1/
Sosúa is a beautiful place in Puerto Plata on the north coast of the Dominican Republic. About 56,000 people live there now.

But Dominican Republic? How did Jews end up founding a beach town in the Dominican Republic? How many Jews?

2/ Image
In 1938 a conference was held in Évian, France to discuss what to do about Jewish & Austrian refugees trying to flee persecution by the Nazis.

This is the same Évian of evian water. The company was founded in 1859 and was selling bottled water by 1908. But I digress.. 3/ Image
Read 9 tweets
Apr 14
It's been great to see the positive response of @satijalab & @fabian_theis to our preprint on Seurat & Scanpy, and their commitment to work to improve transparency of their tools. One immediate benefit will be better practice of PCA in genomics. 1/🧵biorxiv.org/content/10.110…
PCA became a mainstay in genomics after the papers of @soumya_boston, Josh Stuart & @Rbaltman () and @OrlyAlter () ca. 2000 demonstrated its power for studying gene expression. 2/worldscientific.com/doi/abs/10.114…
pnas.org/doi/10.1073/pn…
Back then, having linear algebra on one's side was essential. A rich lab at that time might have something like a Sun Blade workstation clocking ~500MhZ w/ 2Gb RAM. So having fast SVD algorithms made PCA practical, when other methods based on more sophisticated models weren't. 3/ Image
Read 19 tweets
Apr 7
The difference in @10xGenomics' Cell Ranger's default between version 6 and 7 is discussed in this thread, but it's such a big deal that it's worth its own thread.

tl;dr: in v7 Cell Ranger changed how it produces the gene count matrix leading to a huge difference in results. 1/
The change was described in release notes on May 17, 2022, which via two clicks lead to a technical note with more detail: 2/ cdn.10xgenomics.com/image/upload/v…
Image
To understand this technical note it is helpful to be familiar with the three types of reads that are produced in single-cell RNA-seq: spliced (M as a proxy for mature mRNAs), unspliced (N as a proxy for nascent RNAs), and ambiguous between both (labeled A). 3/ Image
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(