1️⃣ What is Spatial Transcriptomics & why is it important?
Spatial Transcriptomics (Method of the Year 2020) is a fast evolving field.
It holds great potential to further our understanding of development & disease, by placing cells in their spatial native tissue context.
Spatial Transcriptomics technologies are of 2 types:
A. Image-based (in situ sequencing & in situ hybridization): profile mRNA with high spatial resolution at sub cellular level.
However:
- only profile a low number of genes
- low sensitivity in mRNA detection
- time consuming
B. Sequencing-based (e.g. Visium): capture position-barcoded mRNA with non-gene-specific probes. Can profile the entire transcriptome & are fast.
However:
Low-resolution spots can contain multiple cells with several blended cell types. This can conceal the true tissue biology.
2️⃣ What is Cellular Deconvolution & why is it important?
In sequencing-based methods (B above), cellular deconvolution means quantifying proportions of different cell types among the blended captured spots. With this, the profiled tissue has a more fine-grained representation.
3️⃣ Which cellular decomposition methods were benchmarked?
Berglund, SpiceMix & STdeconvolve are scRNAseq reference-free. The other 15 methods require same-tissue scRNAseq data
4️⃣ Datasets & technologies:
Image-based real data:
- seqFISH
- MERFISH
Sequencing-based real data:
- ST
- 10X Visium
- Slide-seqV2
- stereo-seq
Simulated:
Due to high resolution & good annotation, the image-based real data was used as ground truth for simulating low-res spots.
This scatter plot shows the resolution of each spot and number of spots & genes in each of the 6 technologies used to generate the real datasets employed in this benchmark.
All in all, 50 datasets were generated.
Datasets were simulated by binning the cells with a unified square size.
The ground truth was calculated according to the number of cells with different cell types in each spot.
Different resolutions of spots can then be simulated by different sizes of the binning squares.
5️⃣ Performance assessment by:
A. accuracy: multiple metrics applied on all methods & datasets
B. robustness: different cell type composition, spatial transcriptomics technique, number genes & number of spots tested in all methods
C. usability: efficiency, code & documentation
6️⃣ Benchmarking results
This is the summary table with the performance of all methods. Darker spots represent better performance.
‼️The authors conclude that generally, Cell2location & DestVI performed consistently well across datasets & scenarios.
7️⃣ The accuracy metrics used are:
- Jensen–Shannon divergence (JSD)
- root-mean-square error (RMSE)
- Pearson correlation coefficient (PCC)
Most methods did well with MERFISH-based simulations, but only CARD, DestVI & SpatialDWLS were high-performing with seqFISH+ (fewer spots).
8️⃣ Robustness: simulated experiments under multiple different conditions
A. number genes: 10,000, 6000 and 3000 genes randomly chosen in the seqFISH+ dataset & 26,365, 18,000 and 9000 in stereo-seq
B. binning size: 20, 50 and 100 μm (MERFISH) & 5, 10 and 15 μm (stereo-seq)
C. 17 original cells types & 11 integrated cell types tested in SlideseqV2 datasets
D. two input normalization methods on the Visium data
E. varying chosen hyperparameters in Visium & SlideseqV2
F. repeat experiments 3 times with the seqFISH+ data with 10,000 genes per spot
I found particularly interesting the robustness testing on Visium data of two commonly used normalization methods: lognorm & Seurat's sctransform.
For the methods that have their own normalization (the majority), best performance corresponds to using raw input data (obviously).
For the methods that did not have their own default normalization, s.a. SpaOTsc & Tangram, normalizing with lognorm triggered better performance.
‼️ What I found really surprising was that all the tested methods performed worse with the sctransform normalization.
All in all, regarding robustness, CARD, Cell2location, Tangram & SD2 were the most robust methods according to their performance with different resolutions, number of genes, number of spots, and number of cell types.
9️⃣ Usability
Regarding computational runtime, NMFreg, STRIDE & Tangram were most efficient.
Most methods had high-quality tutorials & code.
In particular, CARD, Cell2location, RCTD, & DestVI were user-friendly with helpful tutorials & readable code, making them easy to run.
🔟 Guidelines
Taking everything into account, the authors create this flowchart w/ guidelines for users on which cellular deconvolution method to use, depending on their input data.
👏This graphic brings structure to the process of choosing the right method from so many options
A. As expected, most important question is whether additional scRNAseq data from the same tissue is also available
B. Then,the technology platform dictates the number of spots in the data, which informs the choice of method
C. Lastly, the target celltype resolution also matters
Benchmarking studies are cool❤️
This paper is an important contribution to the #SpatialTranscriptomics methods literature & useful for #Bioinformatics Data Scientists looking to apply cellular decomposition to their data
Congrats to the authors & thanks for your thorough work💯
• • •
Missing some Tweet in this thread? You can try to
force a refresh
We use single cell protein quantification & single cell FISH to map #spatial interactions in genetic mosaicism & tumor microenvironment in #Glioblastoma!
I need to raise awareness about an important point in #scRNAseq data analysis, which, in my opinion, is not acknowledged enough:
‼️In practice, most cell type assignment methods will fail on totally novel cell types. Biological/expert curation is necessary!
Here's one example👇
Last year, together with @LabPolyak@harvardmed, we published a study in which we did something totally awesome: we experimentally showed how a TGFBR1 inhibitor drug 💊 prevents breast tumor initiation in two different rat models!
1. Define a generative program consisting of a syntax tree & a set of hierarchical constraints 2. Compile the program in (1) into an energy function 3. Optimize the function via simulated annealing. The solutions are the artificial proteins.
Inspiring Symposium on Cancer Prevention @EACRnews
95% of cancer drugs fail. 94% do not improve life quality.
An ounce of prevention is worth a pound of cure. (B. Franklin)
Cancer prevention is tremendously difficult. But it is also necessary.
We need to shift our focus.
How to move from developing cancer treatments to cancer prevention? @cohen_cyrille
How to change the single gene/ single mutation paradigm for holistic approaches considering multi-omics, lifestyle, exposure and cells as a whole? @AzraRazaMD
How much does the environment matter? Can we prevent cancer by modulating exposure? @CBrisken
Which neoantigens to target? Shared or unique mutations? Overexpressed genes?
Will eliciting immune responses via vaccines help prevent tumors in high-risk populations? @emmyverschuren