Aedin Culhane Profile picture
Prof @MedicineAtUL, Computational immune oncology #bioinformatics, #biostatistics, #rstats, R, @bioconductor, women in STEM, parents in STEM, Tweets are my own

Dec 1, 2021, 11 tweets

Delighted to introduce correspondence analysis of scRNAseq. We show CA of Freeman-Tukey residuals outperforms CA of the Pearson Residuals

corral: Single-cell RNA-seq dimension reduction, batch integration, and visualization with correspondence analysis
biorxiv.org/content/10.110…

Code to reproduce figures are available at github.com/laurenhsu1/cor….

Functions are available in the corral @bioconductor package. bioconductor.org/packages/relea…

Decomposition of the Pearson Residuals is Correspondence analysis.

It's nicely described by displayr.com/math-correspon… .

I also described it in a workshop presented at #Bioc2020 and #Bioc2021 conferences aedin.github.io/PCAworkshop/ar

Its fast and rapid to compute

Correspondence Analysis (CA) is an alternative to PCA that is robust for use with raw or log-normalized scRNAseq counts

& is consistent with studies that recommend decomposition of the Pearson Residuals (Townes et al., 2019, Lause et al., 2021 and Hafemeister & Satija (2019) )

CA has a long tradition in diverse settings and disciplines, including linguistics, business and marketing research, and archaeology

There are many variations of CA that are better adapted to handle overdispersion that classic CA (decomposition of the Pearson Residuals)

We tested these variations of CA, variance stabilizing transformations applied in conjunction with standard CA or using different chi-sq statistics.

We report that CA of the Freeman-Tukey chi sq residuals are better adapted to overdispersion of scRNAseq counts

CA biplot provides easy cluster interpretation.

Transformed counts have an intuitive interpretation
the chi sq statistic, strength of association, between gene & cell

Genes & cells in same direction from origin are associated

Distance from the origin = magnitude of assoc.

CA is better adapted to scRNAseq -> library depth batch effects are better addressed

The scMix data (CellBench @Bioconductor pkg) has 3 lines cells are assayed on different platforms

PCA -batches separated by different library depths
CA - multiBatchNorm correction not needed

Plugging it into existing pipelines is easy, it's a straightforward replacement for PCA. It may improve pipelines. We tested this with scRNAseq dataset alignment. Replacing PCA with CA in the Harmony pipeline improves dataset alignment without impacting speed.

Finally corral is simple, determined and fast.

Determined, direct methods deliver an exact solution, with the same results each time.

Iterative methods (such as glmPCA) have an initial seed & vary between runs. We run these several times and take an average score.

Lauren and I love your feedback... This is her work.

The Corral paper is at doi.org/10.1101/2021.1…

The @Bioconductor package is bioconductor.org/packages/relea…

Her github repo to reproduce the figures is
github.com/laurenhsu1/cor…

We are grateful to @cziscience for funding.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling