Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Lior Pachter

@lpachter

May 2 • 21 tweets • 10 min read Twitter logo

Read on Twitter

Scrolly

@naturemethods

In 2019 "Single-cell multimodal omics" was deemed @naturemethods Method of the Year, and since then many new multimodal methods have been published. But are there tradeoffs w/ multimodal omics?

tl;dr yes! An analysis w/ @sinabooeshaghi & Fan Gao in biorxiv.org/content/10.110… 🧵1/

There are a lot of ways to look at this question and we have much to say (long 🧵ahead!). As a starting point let's begin with our Supplementary Figure 4. This is a comparison of (#snRNAseq+#snATACseq) multimodal technology with unimodal technology. Much to explain here: 2/

(a) & (b) are showing the mean-variance relationship for data from an assay for measuring RNA and TAC (transposable accessible chromatin) in the same cells. The data is from ncbi.nlm.nih.gov/geo/query/acc.…
Cells from human HEK293T & mouse NIH3T3 were mixed. You're looking at the RNA. 3/

The mouse and human counts both display variance quadratic in the mean, consistent with negative binomial data. The quadratic coefficients are similar. This is also the case in (c) and (d) which are data from the same cell lines but with different technology called ISSAAC-seq. 4/

@const_ae

In (e) and (f) you see what unimodal data looks like. Same cell lines, but assayed with 10x Genomics #scRNAseq (the figures are reproduced from @const_ae and @wolfgangkhuber's recent preprint biorxiv.org/content/10.110…). Much less noise in unimodal data. 5/

@sinabooeshaghi

Performing an analysis like this is difficult, because it requires apples-to-apples comparison. Currently, most multimodal assays are preprocessed with custom scripts or "pipelines" coupling together the equivalent of water pipes with electricity lines h/t @sinabooeshaghi . 6/

To perform like-to-like comparisons we had to develop new software that could be used on multiple different assays from different technologies. We focused for now on multimodal single-cell ATAC-seq + RNA-seq, and ended up building a program called snATAK on kallisto bustools. 7/

Now we could compare, say, ISSAAC-seq with SHARE-seq or SHARE-seqv2, or either of them to 10x Genomics Multiome. Or any of these assays to unimodal #scRNAseq or #snRNAseq or #snATACseq. We started by validating snATAK with the widely used Cell Ranger and Cell Ranger ARC tools. 8/

@RongFan8

The first column is a comparison of snATAK to 10x's Cell Ranger ARC on 10x Multiome assayed PBMCs. The right column is a comparison of snATAK processing to Cell Ranger on a spatial ATAC-seq dataset (recently published by the @RongFan8 lab nature.com/articles/s4158…). 9/

@GoogleColab

With overall near identical results (although snATAK outperformed Cell Ranger on the spatial ATAC-seq data) we were ready to assess the multiome tradeoff, at least for ATAC-seq / RNA-seq (for now). BTW, snATAK is memory efficient, can run on @GoogleColab, and is fast. 10/

In a knee plot comparison of 10x ATAC-seq and the ATAC part of 10x Multiome you see that the multiome ATAC has an extra “knee” which is the result of a high load of cells resulting in doublets. In the relevant part, unimodal ATAC-seq outperforms its multiome counterpart. 11/

Multiome also suffers fewer reads per peak. Of course for these results datasets have been subsampled to the same depth. 12/

Back to the previous data, we performed comparisons of different technologies. There is a lot to unpack in the figure below. One technology has more doublets. But it also is much more efficient (at nuclei assayed / reads sequenced). Revealed thanks to uniform preprocessing. 13/

One of the useful features of snATAK is that it can perform allele-specific analysis. We used it to quantify the association between strand specificity in open chromatin, and strand specificity in expression. That's what you see here (w/ 10x Multiome PBMCs). 14/

In this plot each point is a cell type / SNP combination. The Alt / Ref on the x-axis is based on analysis of whether, in a cell type, the ATAC was open on the Ref or Alt strand only at a SNP. The y-axis is the corresponding Ref vs. Alt usage for gene expression. Makes sense. 15/

For this analysis the registration between RNA & ATAC is useful. We are sure that the same cells contribute both to the RNA and ATAC. However, while the result for cell types is convincing, we learn nothing about individual cells. The data is too sparse; a multiome tradeoff. 16/

In other words, here Multiome has produced a non-constructive existence proof. It's like asking for two numbers x and y such that x^y is rational, but x and y are both irrational. This is a seemingly hard problem. But... 17/

... we know that (√2^√2)^√2 = 2. Since √2 is irrational, if √2^√2 is rational we have an example. Otherwise one irrational number is √2^√2, and the other is √2, and we have an example. Existence proved. Not constructive. 18/

The code for reproducing the results described above, and for running snATAK, is here: github.com/pachterlab/BGP… 19/

There is much more to the multimodal tradeoff than is covered in our preprint: there are of course many other modalities to consider. But w/ snATAK (which can work whenever genome alignment is needed) & kallisto bustools we have shown that uniform preprocessing is possible. 20/20

@biorxivpreprint

Somehow the link to the @biorxivpreprint was scrambled in the first tweet. Reposting here, along with a link to the top of the thread. biorxiv.org/content/10.110…

https://twitter.com/lpachter/status/1653201162517229568?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @lpachter

Lior Pachter

@lpachter

May 2

https://twitter.com/const_ae/status/1645455772468273153

Actually, not transforming the data outperforms log(y/s+1). 1/

https://twitter.com/const_ae/status/1645455772468273153

The "performance" in this analysis boils down to checking consistency of the kNN graph after transformation. That's certainly a property one can optimize for, but it's by no means the only one. In fact, if it was the only property of interest, one could just not transform. 2/

Of course that is trivial and uninteresting. The purpose of normalization is to remove technical noise and stabilize variance. But then one should check how well that is done. And as it turns out, log(y/s+1) actually removes too much "noise". 3/

Read 6 tweets

Lior Pachter

@lpachter

May 2

@GorinGennady

In a recent preprint with @GorinGennady (biorxiv.org/content/10.110…) we provide a quantitative answer to to this question, namely what information about variance (among cells in a cell type, or more generally many cell types) does a UMAP provide? A short🧵1/

https://twitter.com/WallaceUcsf/status/1652772412776394753

The variability in gene expression across cells can be attributed to biological stochasticity and technical noise. In practice it's hard to break down the variance into these constituent parts. How do we know what is biological vs. technical? 2/

Here's an idea: within a cell type, we can obtain an accurate estimate of gene expression by averaging across cells. Now we can get a lower bound for biological variability by computing the variance across very distinct cell types. 3/

Read 17 tweets

Lior Pachter

@lpachter

Mar 23

@nilshomer

To follow up on this comment by @nilshomer, I wanted to say a few things about why @sinabooeshaghi designed and developed seqspec (just pre-printed here biorxiv.org/content/10.110…), and our hopes for how it can be used for transparency and reproducibility in genomics. 🧵1/

https://twitter.com/nilshomer/status/1638719789303738368

Since the development of sequence census assays by Barbara Wold in her pair of transformative papers in 2007--2008 on Chip-seq and RNA-seq (science.org/doi/10.1126/sc… and nature.com/articles/nmeth…), the use of sequencing for molecular biology has exploded. 2/

Wold and Myers predicted this explosion in 2008, writing "an exciting frontier is just beginning to emerge" and recognizing the importance of "being able to assay the regulatory inputs and outputs of the genome routinely and comprehensively" nature.com/articles/nmeth… 3/

Read 16 tweets

Lior Pachter

@lpachter

Jan 19

@MariaCarilli

Interested in "integrating" multimodal #scRNAseq data? W/ @MariaCarilli, @GorinGennady, @funion10 & Tara Chari we introduce biVI, which combines the scVI variational autoencoder with biophysically motivated bivariate models for RNA distributions. 🧵 1/
biorxiv.org/content/10.110…

https://twitter.com/anshulkundaje/status/1417648380801556486

One of the clearest cases for "integration" is in combining measurements of nascent and mature mRNAs, which can be obtained with every #scRNAseq experiment. Should "intronic counts" be added to "exonic counts"? Or is it better to pick one or the other?

https://twitter.com/anshulkundaje/status/1417648380801556486

This important question has been swept under the rug. Perhaps that is because it is inconvenient to have to rethink #scRNAseq with two count matrices as input, instead of one. How does one cluster with two matrices? How does one find marker genes with them? 3/

Read 23 tweets

Lior Pachter

@lpachter

Jan 2

https://twitter.com/AMartinezArias/status/1609814025562374145

This flippant comment on #scRNAseq algorithms reflects a common disrespect for computational biologists who are frequently derided for not asking "good biological questions". Moreover, it is peak chutzpah. A short 🧵..

https://twitter.com/AMartinezArias/status/1609814025562374145

@RArgelaguet

As pointed out by @RArgelaguet, the OP recently coauthored a paper where many #scRNAseq methods, algorithms, and tools were used.. I wonder which of them the OP would have preferred was not developed. @AMartinezArias, please choose from this list:

https://twitter.com/RArgelaguet/status/1609857595640102915?s=20&t=1dmSN8lZxDCxCsoV84CaHA

Bowtie2
nature.com/articles/nmeth…

Read 27 tweets

Lior Pachter

@lpachter

Dec 22, 2022

You have to hand it to Lex Fridman. His grift is not an amateur job. Take his Twitter photo. A professor standing in front of a blackboard with some math. Right?

This photo (see RHS of image below) is from what he calls his "MIT course" on Deep Learning for Self-Driving Cars. Sounds like good stuff. CS, math, self driving cars. #broheaven. So what is the problem? He is standing in front of the blackboard.

Well first of all, this was an MIT IAP class. IAP is a short period in January when students get to take fun classes on various topic that can be taught by anyone (many by students). I once sat in on a brain dissection. You can learn how to count cards. web.mit.edu/willma/www/mit…

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Lior Pachter

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @lpachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Lior Pachter

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!