Isidro Cortes-Ciriano Profile picture
Nov 24, 2022 21 tweets 8 min read Read on X
We are most happy to present SComatic, an algorithm for de novo detection of somatic mutations in high-throughput single-cell data, including single-cell RNAseq and ATAC-seq. Monumental effort led by @fran_muyas biorxiv.org/content/10.110… Thread 👇 Image
Context: detecting somatic mutations in single cells is critical to study somatic evolution, cell plasticity, cancer and clonal mosaicism. However, identifying mutations at single-cell resolution is challenging due to various types of artefacts (allelic imbalances, etc).
Detecting mutations in single cells and mapping them to cell phenotypes is even harder, as methods to perform joint profiling of transcriptomes and genotypes do not scale easily and require complex protocols.
Calling mutations in single-cell data sets directly is one appealing option (that is, calling mutations in scRNAseq reads). However, current methods (see below) cannot deal with the high error rates of single-cell data, which vastly limits mutation detection performance (below).
Thus, previous methods designed to detect somatic mutations in single-cell data rely on targeted amplification of well-studied cancer driver mutations (e.g. JAK2 V617F), or direct detection in sequencing reads of mutations identified using matched sequencing data (e.g. bulk WGS).
The key issue is that these approaches cannot be applied to existing data sets without matched bulk or single-cell DNAseq data, and are limited to study known mutations. As a result, the patterns and rates of mutations in most cell types remain largely unknown.
To address these limitations, we developed SComatic, an algorithm that permits detection of somatic mutations in single-cell data sets, including scRNAseq and scATACseq, without matched bulk or single-cell DNAseq data. Does it really work? How? Come and see 👇
SComatic uses statistical models, filters, cell annotations and a "panel of normals" strategy to distinguish somatic mutations from technical and sequencing artefacts. Full details in the preprint and code here: github.com/cortes-ciriano… Image
For validation purposes, we first used matched WES and scRNAseq data from cutaneous squamous cell carcinomas (pubmed.ncbi.nlm.nih.gov/32579974/). The mutations detected by SComatic in epithelial cells showed the spectrum expected for these samples (UV radiation): Image
Using these data, we show that the precision of SComatic is 20-40 times higher(!) than existing methods without compromising sensitivity! (this is one of the key plots in our study!! 😀) Image
The higher performance of our approach is also reflected in the mutational spectra of the mutation call sets, with SComatic and WES detecting mutational signatures consistent with UV-light mutagenesis, which is in stark contrast to other methods: Image
This is even clearer if we have a look at the mutational spectra (see the cosine similarity with the WES data.. 0.99 for SComatic!): Image
We then called somatic mutations in epithelial cells from colorectal cancers (@NCIHTAN), which revealed mutational rates and signatures consistent with the underlying genotypes (e.g. high mutational burdens in MSI samples, which were comparable to TCGA data - see the preprint): Image
Mutational signature analysis allowed us to identify one POLE-deficient sample misclassified as MSI, which showed a clear contribution of mutational signatures associated with POLE-deficiency but not MMRD: Image
We also applied SComatic to scRNAseq data from #GTEx and sciATAC-seq data from the Ren's lab (pubmed.ncbi.nlm.nih.gov/34774128/), which allowed us to detect mutational signatures and rates of mutations at the single-cell and cell-type level across multiple organs. Image
In addition, we characterised the mutational rates and signatures in cardiomyocytes using data from the human cell atlas (@teichlab), which were broadly consistent with recent single-cell WGS data from @ChrisAWalsh1, Chen's and @EAliceLee2's labs: Image
Importantly, the mutation burdens and spectra that we estimate are overall consistent across data sets and single-cell sequencing technologies despite strong differences in seq depth, sequencing, etc: Image
Finally, we also show that the somatic mutations called by SComatic permit de novo discovery of mutational signatures! even in samples with relatively low mutational burdens (although in some cases the power is limited by low numbers of samples and mutations): Image
In sum, we present a novel algorithm that opens lots of possibilities to study somatic mutagenesis at single-cell resolution using healthy and diseased human samples, and also across the tree of life thanks to the advent of single-cell data for non-model organisms at eg @czbiohub
We anticipate higher sensitivity to detect mutations as the throughput and seq depth of single-cell data sets increases. Novel platforms, such as MAS-Seq @PacBio, might increase sensitivity and specificity even further, opening avenues to map genotypes to transcriptomes at scale.
Thanks to everyone involved and to those who tirelessly generated (and annotated!!) large-scale single-cell data sets! As always, we cannot emphasise strongly enough the importance of good data sharing practices! SComatic is available at github.com/cortes-ciriano…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Isidro Cortes-Ciriano

Isidro Cortes-Ciriano Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @isidrolauscher

Jul 26
Thrilled that #SAVANA is out🥳 , our method to analyse somatic SVs, allele-specific copy number aberrations (SCNAs), tumour purity & ploidy in clinical samples using #long-read sequencing @nanopore @PacBio! 😮 Comparisons of long/short reads & algorithms👇 biorxiv.org/content/10.110…
Image
Background: detecting SVs and SCNAs in human tumours is critical to inform diagnosis, management & treatment for cancer patients. This is routinely done in clinical settings using @illumina sequencing. It is widely believed (and said) that long reads should better reconstruct SVs
Indeed, haven't you heard that many(!) times? Yet, the number of tumours analysed with both technologies is still limited, and evidence supporting that (tens of) thousands of SVs in each cancer genome are only detectable by long reads comes primarily from studying cell lines..
Read 25 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(