Lior Pachter Profile picture
May 2 21 tweets 10 min read Twitter logo Read on Twitter
In 2019 "Single-cell multimodal omics" was deemed @naturemethods Method of the Year, and since then many new multimodal methods have been published. But are there tradeoffs w/ multimodal omics?

tl;dr yes! An analysis w/ @sinabooeshaghi & Fan Gao in biorxiv.org/content/10.110… 🧵1/
There are a lot of ways to look at this question and we have much to say (long 🧵ahead!). As a starting point let's begin with our Supplementary Figure 4. This is a comparison of (#snRNAseq+#snATACseq) multimodal technology with unimodal technology. Much to explain here: 2/ Image
(a) & (b) are showing the mean-variance relationship for data from an assay for measuring RNA and TAC (transposable accessible chromatin) in the same cells. The data is from ncbi.nlm.nih.gov/geo/query/acc.…
Cells from human HEK293T & mouse NIH3T3 were mixed. You're looking at the RNA. 3/
The mouse and human counts both display variance quadratic in the mean, consistent with negative binomial data. The quadratic coefficients are similar. This is also the case in (c) and (d) which are data from the same cell lines but with different technology called ISSAAC-seq. 4/
In (e) and (f) you see what unimodal data looks like. Same cell lines, but assayed with 10x Genomics #scRNAseq (the figures are reproduced from @const_ae and @wolfgangkhuber's recent preprint biorxiv.org/content/10.110…). Much less noise in unimodal data. 5/
Performing an analysis like this is difficult, because it requires apples-to-apples comparison. Currently, most multimodal assays are preprocessed with custom scripts or "pipelines" coupling together the equivalent of water pipes with electricity lines h/t @sinabooeshaghi . 6/ Image
To perform like-to-like comparisons we had to develop new software that could be used on multiple different assays from different technologies. We focused for now on multimodal single-cell ATAC-seq + RNA-seq, and ended up building a program called snATAK on kallisto bustools. 7/ Image
Now we could compare, say, ISSAAC-seq with SHARE-seq or SHARE-seqv2, or either of them to 10x Genomics Multiome. Or any of these assays to unimodal #scRNAseq or #snRNAseq or #snATACseq. We started by validating snATAK with the widely used Cell Ranger and Cell Ranger ARC tools. 8/
The first column is a comparison of snATAK to 10x's Cell Ranger ARC on 10x Multiome assayed PBMCs. The right column is a comparison of snATAK processing to Cell Ranger on a spatial ATAC-seq dataset (recently published by the @RongFan8 lab nature.com/articles/s4158…). 9/ Image
With overall near identical results (although snATAK outperformed Cell Ranger on the spatial ATAC-seq data) we were ready to assess the multiome tradeoff, at least for ATAC-seq / RNA-seq (for now). BTW, snATAK is memory efficient, can run on @GoogleColab, and is fast. 10/ Image
In a knee plot comparison of 10x ATAC-seq and the ATAC part of 10x Multiome you see that the multiome ATAC has an extra “knee” which is the result of a high load of cells resulting in doublets. In the relevant part, unimodal ATAC-seq outperforms its multiome counterpart. 11/ Image
Multiome also suffers fewer reads per peak. Of course for these results datasets have been subsampled to the same depth. 12/ Image
Back to the previous data, we performed comparisons of different technologies. There is a lot to unpack in the figure below. One technology has more doublets. But it also is much more efficient (at nuclei assayed / reads sequenced). Revealed thanks to uniform preprocessing. 13/ Image
One of the useful features of snATAK is that it can perform allele-specific analysis. We used it to quantify the association between strand specificity in open chromatin, and strand specificity in expression. That's what you see here (w/ 10x Multiome PBMCs). 14/ Image
In this plot each point is a cell type / SNP combination. The Alt / Ref on the x-axis is based on analysis of whether, in a cell type, the ATAC was open on the Ref or Alt strand only at a SNP. The y-axis is the corresponding Ref vs. Alt usage for gene expression. Makes sense. 15/
For this analysis the registration between RNA & ATAC is useful. We are sure that the same cells contribute both to the RNA and ATAC. However, while the result for cell types is convincing, we learn nothing about individual cells. The data is too sparse; a multiome tradeoff. 16/ Image
In other words, here Multiome has produced a non-constructive existence proof. It's like asking for two numbers x and y such that x^y is rational, but x and y are both irrational. This is a seemingly hard problem. But... 17/
... we know that (√2^√2)^√2 = 2. Since √2 is irrational, if √2^√2 is rational we have an example. Otherwise one irrational number is √2^√2, and the other is √2, and we have an example. Existence proved. Not constructive. 18/
The code for reproducing the results described above, and for running snATAK, is here: github.com/pachterlab/BGP… 19/
There is much more to the multimodal tradeoff than is covered in our preprint: there are of course many other modalities to consider. But w/ snATAK (which can work whenever genome alignment is needed) & kallisto bustools we have shown that uniform preprocessing is possible. 20/20
Somehow the link to the @biorxivpreprint was scrambled in the first tweet. Reposting here, along with a link to the top of the thread. biorxiv.org/content/10.110…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lior Pachter

Lior Pachter Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lpachter

May 2
Actually, not transforming the data outperforms log(y/s+1). 1/
The "performance" in this analysis boils down to checking consistency of the kNN graph after transformation. That's certainly a property one can optimize for, but it's by no means the only one. In fact, if it was the only property of interest, one could just not transform. 2/
Of course that is trivial and uninteresting. The purpose of normalization is to remove technical noise and stabilize variance. But then one should check how well that is done. And as it turns out, log(y/s+1) actually removes too much "noise". 3/
Read 6 tweets
May 2
In a recent preprint with @GorinGennady (biorxiv.org/content/10.110…) we provide a quantitative answer to to this question, namely what information about variance (among cells in a cell type, or more generally many cell types) does a UMAP provide? A short🧵1/
The variability in gene expression across cells can be attributed to biological stochasticity and technical noise. In practice it's hard to break down the variance into these constituent parts. How do we know what is biological vs. technical? 2/
Here's an idea: within a cell type, we can obtain an accurate estimate of gene expression by averaging across cells. Now we can get a lower bound for biological variability by computing the variance across very distinct cell types. 3/
Read 17 tweets
Mar 23
To follow up on this comment by @nilshomer, I wanted to say a few things about why @sinabooeshaghi designed and developed seqspec (just pre-printed here biorxiv.org/content/10.110…), and our hopes for how it can be used for transparency and reproducibility in genomics. 🧵1/
Since the development of sequence census assays by Barbara Wold in her pair of transformative papers in 2007--2008 on Chip-seq and RNA-seq (science.org/doi/10.1126/sc… and nature.com/articles/nmeth…), the use of sequencing for molecular biology has exploded. 2/
Wold and Myers predicted this explosion in 2008, writing "an exciting frontier is just beginning to emerge" and recognizing the importance of "being able to assay the regulatory inputs and outputs of the genome routinely and comprehensively" nature.com/articles/nmeth… 3/
Read 16 tweets
Jan 19
Interested in "integrating" multimodal #scRNAseq data? W/ @MariaCarilli, @GorinGennady, @funion10 & Tara Chari we introduce biVI, which combines the scVI variational autoencoder with biophysically motivated bivariate models for RNA distributions. 🧵 1/
biorxiv.org/content/10.110…
One of the clearest cases for "integration" is in combining measurements of nascent and mature mRNAs, which can be obtained with every #scRNAseq experiment. Should "intronic counts" be added to "exonic counts"? Or is it better to pick one or the other? 2/
This important question has been swept under the rug. Perhaps that is because it is inconvenient to have to rethink #scRNAseq with two count matrices as input, instead of one. How does one cluster with two matrices? How does one find marker genes with them? 3/
Read 23 tweets
Jan 2
This flippant comment on #scRNAseq algorithms reflects a common disrespect for computational biologists who are frequently derided for not asking "good biological questions". Moreover, it is peak chutzpah. A short 🧵..
As pointed out by @RArgelaguet, the OP recently coauthored a paper where many #scRNAseq methods, algorithms, and tools were used.. I wonder which of them the OP would have preferred was not developed. @AMartinezArias, please choose from this list:
Read 27 tweets
Dec 22, 2022
You have to hand it to Lex Fridman. His grift is not an amateur job. Take his Twitter photo. A professor standing in front of a blackboard with some math. Right?
This photo (see RHS of image below) is from what he calls his "MIT course" on Deep Learning for Self-Driving Cars. Sounds like good stuff. CS, math, self driving cars. #broheaven. So what is the problem? He is standing in front of the blackboard.
Well first of all, this was an MIT IAP class. IAP is a short period in January when students get to take fun classes on various topic that can be taught by anyone (many by students). I once sat in on a brain dissection. You can learn how to count cards. web.mit.edu/willma/www/mit…
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(