Tweet

Simona Cristea

Mar 24 • 25 tweets • 10 min read

@NatureComms

Do you need to analyze Spatial Transcriptomics data, but are lost in the endless sea of methods?

Here's an explainer of the new @NatureComms paper benchmarking 18 spatial cellular deconvolution methods🧵🧵

nature.com/articles/s4146…

This thread is organized as follows:

1️⃣ Intro to Spatial Transcriptomics
2️⃣ Intro to Cellular Deconvolution
3️⃣ Methods benchmarked
4️⃣ Datasets used (real & simulated)
5️⃣ Performance assessment
6️⃣ Benchmarking results
7️⃣ Accuracy
8️⃣ Robustness
9️⃣ Usability
🔟 Guidelines

1️⃣ What is Spatial Transcriptomics & why is it important?

Spatial Transcriptomics (Method of the Year 2020) is a fast evolving field.

It holds great potential to further our understanding of development & disease, by placing cells in their spatial native tissue context.

Spatial Transcriptomics technologies are of 2 types:

A. Image-based (in situ sequencing & in situ hybridization): profile mRNA with high spatial resolution at sub cellular level.

However:
- only profile a low number of genes
- low sensitivity in mRNA detection
- time consuming

B. Sequencing-based (e.g. Visium): capture position-barcoded mRNA with non-gene-specific probes. Can profile the entire transcriptome & are fast.

However:

Low-resolution spots can contain multiple cells with several blended cell types. This can conceal the true tissue biology.

2️⃣ What is Cellular Deconvolution & why is it important?

In sequencing-based methods (B above), cellular deconvolution means quantifying proportions of different cell types among the blended captured spots. With this, the profiled tissue has a more fine-grained representation.

3️⃣ Which cellular decomposition methods were benchmarked?

The 18 methods are:

1. CARD
tinyurl.com/arztf9bs
2. Cell2location
tinyurl.com/ymy4fn68
3. RCTD
tinyurl.com/2p9xkyp9
4. DestVI
tinyurl.com/mvtweytv
5. stereoscope
tinyurl.com/yvszhfc7

6. SpatialDecon
tinyurl.com/2p93jzz4
7. STRIDE
tinyurl.com/ew7jbukh
8. NMFreg
tinyurl.com/mv6xn7sr
9. SpatialDWLS
tinyurl.com/yfb58dr9
10. SPOTlight
tinyurl.com/2sy9dwtc
11. DSTG
tinyurl.com/2vdh4zvu
12. SD2
tinyurl.com/mpntkd7e

13. Tangram
tinyurl.com/mrpn6puj
14. Berglund
tinyurl.com/2kdk4585
15. SpiceMix
tinyurl.com/3xmpse6v
16. STdeconvolve
tinyurl.com/4uar2hzs
17. SpaOTsc
tinyurl.com/554dbrjm
18. novoSpaRc
tinyurl.com/2w44wnap

The methodology behind these tools is either:

- probabilistic modeling
- non-negative matrix factorization(NMF)
- graphs
- optimal-transport(OT)
- deep learning

Berglund, SpiceMix & STdeconvolve are scRNAseq reference-free. The other 15 methods require same-tissue scRNAseq data

4️⃣ Datasets & technologies:

Image-based real data:
- seqFISH
- MERFISH

Sequencing-based real data:
- ST
- 10X Visium
- Slide-seqV2
- stereo-seq

Simulated:
Due to high resolution & good annotation, the image-based real data was used as ground truth for simulating low-res spots.

This scatter plot shows the resolution of each spot and number of spots & genes in each of the 6 technologies used to generate the real datasets employed in this benchmark.

All in all, 50 datasets were generated.

Datasets were simulated by binning the cells with a unified square size.

The ground truth was calculated according to the number of cells with different cell types in each spot.

Different resolutions of spots can then be simulated by different sizes of the binning squares.

5️⃣ Performance assessment by:

A. accuracy: multiple metrics applied on all methods & datasets

B. robustness: different cell type composition, spatial transcriptomics technique, number genes & number of spots tested in all methods

C. usability: efficiency, code & documentation

6️⃣ Benchmarking results

This is the summary table with the performance of all methods. Darker spots represent better performance.

‼️The authors conclude that generally, Cell2location & DestVI performed consistently well across datasets & scenarios.

7️⃣ The accuracy metrics used are:
- Jensen–Shannon divergence (JSD)
- root-mean-square error (RMSE)
- Pearson correlation coefficient (PCC)

Most methods did well with MERFISH-based simulations, but only CARD, DestVI & SpatialDWLS were high-performing with seqFISH+ (fewer spots).

8️⃣ Robustness: simulated experiments under multiple different conditions

A. number genes: 10,000, 6000 and 3000 genes randomly chosen in the seqFISH+ dataset & 26,365, 18,000 and 9000 in stereo-seq

B. binning size: 20, 50 and 100 μm (MERFISH) & 5, 10 and 15 μm (stereo-seq)

C. 17 original cells types & 11 integrated cell types tested in SlideseqV2 datasets

D. two input normalization methods on the Visium data

E. varying chosen hyperparameters in Visium & SlideseqV2

F. repeat experiments 3 times with the seqFISH+ data with 10,000 genes per spot

I found particularly interesting the robustness testing on Visium data of two commonly used normalization methods: lognorm & Seurat's sctransform.

For the methods that have their own normalization (the majority), best performance corresponds to using raw input data (obviously).

For the methods that did not have their own default normalization, s.a. SpaOTsc & Tangram, normalizing with lognorm triggered better performance.

‼️ What I found really surprising was that all the tested methods performed worse with the sctransform normalization.

All in all, regarding robustness, CARD, Cell2location, Tangram & SD2 were the most robust methods according to their performance with different resolutions, number of genes, number of spots, and number of cell types.

9️⃣ Usability

Regarding computational runtime, NMFreg, STRIDE & Tangram were most efficient.

Most methods had high-quality tutorials & code.

In particular, CARD, Cell2location, RCTD, & DestVI were user-friendly with helpful tutorials & readable code, making them easy to run.

🔟 Guidelines

Taking everything into account, the authors create this flowchart w/ guidelines for users on which cellular deconvolution method to use, depending on their input data.

👏This graphic brings structure to the process of choosing the right method from so many options

A. As expected, most important question is whether additional scRNAseq data from the same tissue is also available

B. Then,the technology platform dictates the number of spots in the data, which informs the choice of method

C. Lastly, the target celltype resolution also matters

Benchmarking studies are cool❤️

This paper is an important contribution to the #SpatialTranscriptomics methods literature & useful for #Bioinformatics Data Scientists looking to apply cellular decomposition to their data

Congrats to the authors & thanks for your thorough work💯

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @simocristea

Simona Cristea

@simocristea

Mar 22

@CellReports

🚨Our new study is out @CellReports!

We use single cell protein quantification & single cell FISH to map #spatial interactions in genetic mosaicism & tumor microenvironment in #Glioblastoma!

Wonderful collaboration w/ @janiszewska_lab @DalitEngelhardt
@Kacper_W_PhD

Deep dive👇

First, some context.

Glioblastoma (GBM) is one of the deadliest, most aggressive cancers that exist, with a median survival of only 15 months.

In GBM, 'single cell heterogeneity' are not simply buzzwords.

Rather, this immense heterogeneity is a main reason of treatment failure

Recent work demonstrated that single GBM tumors are mosaics of cells in different states, each associated with distinct genomic driver alterations.

While transitions between cell states can occur, each genetic driver favors a particular cell state.

tinyurl.com/mrx2jpj8

Read 32 tweets

Simona Cristea

@simocristea

Feb 27

I need to raise awareness about an important point in #scRNAseq data analysis, which, in my opinion, is not acknowledged enough:

‼️In practice, most cell type assignment methods will fail on totally novel cell types. Biological/expert curation is necessary!

Here's one example👇

@LabPolyak

Last year, together with @LabPolyak @harvardmed, we published a study in which we did something totally awesome: we experimentally showed how a TGFBR1 inhibitor drug 💊 prevents breast tumor initiation in two different rat models!

Here's a detailed thread on this paper:

https://twitter.com/simocristea/status/1600578512578035733

As you can imagine, this is a big thing. Treating tumors is already hard, preventing them is even harder!

Obviously, the most burning question for us then became: what is the drug actually doing to prevent tumor initiation?

Or, what is different in treated vs. control cells?

Read 17 tweets

Simona Cristea

@simocristea

Feb 23

@naturemethods

🚨New #SpatialTranscriptomics #Bioinformatics data resource out in @naturemethods.

SODB, a platform with >2,400 manually curated spatial experiments from >25 spatial omics technologies & interactive analytical modules.

This🧵will walk you through all the features of SODB [1/33]

First, some background.

Spatial technologies complement classical genomics by also providing information about spatial context & tissue organization in:

- embriogenesis
- disease development
- normal tissue homeostasis

The field has exploded 🔥 in the past 2 years. [2/33]

But, data from different studies is stored in different configurations/repositories, such as:

- GEO
- zenodo
- fig share
- SingleCellPortal
- IONPath for MIBI
- 10XGenomics website

This makes data sharing & re-analysis challenging.

Databases exist, but have limitations. [3/33]

Read 33 tweets

Simona Cristea

@simocristea

Feb 10

Interested in how classical rule-based modular biology & #deeplearning fit together for the design of artificial proteins?

A new preprint combines these two modeling strategies to generate novel proteins!

Let's take a closer look at this innovative framework🧵👇

@MetaAI

This method comes from the @MetaAI FAIR protein folks: @BrianHie, @salcandido, @ebetica, @OriKabeli, @proteinrosh, @nikismetanin, @TomSercu, @alexrives and is available as a preprint.

biorxiv.org/content/10.110…

The proposed methodology has 3 steps:

1. Define a generative program consisting of a syntax tree & a set of hierarchical constraints
2. Compile the program in (1) into an energy function
3. Optimize the function via simulated annealing. The solutions are the artificial proteins.

Read 20 tweets

Simona Cristea

@simocristea

Feb 2

@NatureBiotech

🚨New milestone for #DeepLearning & life sciences in @NatureBiotech

Generating brand new functional proteins from scratch with large language models (e.g. #chatGPT)

Let’s understand this Transformers model used for protein design, how well it works & why this is important🧵👇

@nikhil_ai

The very nice paper discussed in this thread comes from a team led by @nikhil_ai at Salesforce @SFResearch 👏

It was available online as a preprint since 2021.

biorxiv.org/content/10.110…

nature.com/articles/s4158…

@UW

--Background--

Designing novel proteins carries enormous practical implications: from health to environment to food production, among many others.

Many research & industry groups do great work in this space, such as the Baker Lab @UW.

bakerlab.org

Read 22 tweets

Simona Cristea

@simocristea

Jan 31

@EACRnews

Inspiring Symposium on Cancer Prevention @EACRnews

95% of cancer drugs fail. 94% do not improve life quality.

An ounce of prevention is worth a pound of cure. (B. Franklin)

Cancer prevention is tremendously difficult. But it is also necessary.

We need to shift our focus.

@cohen_cyrille

How to move from developing cancer treatments to cancer prevention? @cohen_cyrille

How to change the single gene/ single mutation paradigm for holistic approaches considering multi-omics, lifestyle, exposure and cells as a whole? @AzraRazaMD

@CBrisken

How much does the environment matter? Can we prevent cancer by modulating exposure? @CBrisken

Which neoantigens to target? Shared or unique mutations? Overexpressed genes?

Will eliciting immune responses via vaccines help prevent tumors in high-risk populations? @emmyverschuren

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Simona Cristea

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @simocristea

Simona Cristea

Simona Cristea

Simona Cristea

Simona Cristea

Simona Cristea

Simona Cristea

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!