Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Simona Cristea

@simocristea

Feb 23, 2023 • 33 tweets • 11 min read • Read on X

Scrolly

@naturemethods

🚨New #SpatialTranscriptomics #Bioinformatics data resource out in @naturemethods.

SODB, a platform with >2,400 manually curated spatial experiments from >25 spatial omics technologies & interactive analytical modules.

This🧵will walk you through all the features of SODB [1/33]

First, some background.

Spatial technologies complement classical genomics by also providing information about spatial context & tissue organization in:

- embriogenesis
- disease development
- normal tissue homeostasis

The field has exploded 🔥 in the past 2 years. [2/33]

But, data from different studies is stored in different configurations/repositories, such as:

- GEO
- zenodo
- fig share
- SingleCellPortal
- IONPath for MIBI
- 10XGenomics website

This makes data sharing & re-analysis challenging.

Databases exist, but have limitations. [3/33]

At the moment, the 3 main spatial genomics databases are:

1. SpatialDB 2019: pioneer database with data browsing, downloading, gene comparison & spatial expression visualization. But it only provides raw data, which needs to be further transformed. [4/33]
academic.oup.com/nar/article/48…

2. STOmicsDB 2022: improvement on SpatialDB regarding the range of spatial data types & the user interaction interface. It also provides visualization of biological features, such as gene distributions & spatial marker genes. [5/33]

biorxiv.org/content/10.110…

3. SOAR 2022: covers a similar range of data types as STomicsDB. In addition, it also provides nice spatial analytical modules, doing, among others, spatially variable gene analysis or cell-type interaction analysis. [6/33]

biorxiv.org/content/10.110…

Back to SODB.

At a glance, how does it contribute?

1. it's a large repository for downloading spatial data: transcriptomics, proteomics, metabolomics, genomics & multiomics

2. it provides superior interactive data exploration through the module Spatial Omics View SOView [7/33]

SODB has 4 unique features.

1. Spatial datasets:
- more data available than in the other spatial databases
- wide range of spatial technologies

2. Interactive visualization module SOView
- quickly previews global tissue structure
- identifies subtle tissue substructures. [8/33]

3. Interactive display panel
- can be combined with SOView to automatically produce molecular markers for user-defined regions.

4. Command line package available
- much more efficient downloading of spatial data. [9/33]

In SODB, data are organized using a hierarchical tree with five levels: root, Biotech category, Biotechnology, Dataset and Experiment.

One dataset may consist of multiple replicates or control slices, each termed "experiment".

Experiments are the leaves of the tree. [10/33]

The data in each experiment consists of:
1. continuous molecular measurements (such as gene expression) in "spots"
2. spatial x-y coordinates of these spots

‼️ Important to note that spots don't mean single cells, rather spatial conglomerates of tens of cells.

[11/33]

The spatial data can be downloaded in a unified format for convenient interaction with downstream analytical pipelines such as Scanpy or Squidpy.

With this data format, cell-wise and feature-wise annotation are easily incorporated.

[12/33]

Here are all the 2,400 spatial datasets available in SODB.

[13/33]

The spatial datasets come from 7 species.

Mouse and human were the two most studied species, and consisted of 50.9% and 46.1% of all experiments.

[14/33]

Spatial transcriptomics (62.6% of all experiments) & proteomics (35.3%) were the main technologies used.

ST, the earliest spatial transcriptomics technology, made up 26.3% of experiments, followed by MERFISH (13.5%), the most used imaging-based spatial transcriptomics.

[15/33]

Different brain regions were among the most studied, including cortex regions.

Apart from neuroscience studies, other organs, such as liver & heart, were also preferred targets.

In cancer research, breast cancer & colorectal carcinoma were two prominent targets.

[16/33]

Human and mouse studies differed in the spatial technologies used.

Human: >50% of experiments were generated by spatial proteomics (MIBI, IMC & CODEX), with few spatial transcriptomics.

Mouse: almost all experiments were spatial transcriptomics (ST, Slide-Seq & Visium).
[17/33]

Among most spatial technologies, there existed a trade-off between number of spots & molecular features.

Spatial proteomics (blue circle) shows strength in finer spot resolution while suffering from limited (<100) protein multiplexing.

[18/33]

Classical spatial transcriptomics technologies s.a. 10XVisium & ST (red circle) shows high gene throughput and low number of spots.

Newer technologies, s.a. sciSpace, Slide-seqV2 & Stereo-seq (green circle) show improvements in spatial resolutions & spot throughput.

[19/33]

Another cluster of spatial transcriptomics datasets (mainly imaging-based technologies; yellow circle) had a smaller number of targeted genes compared with traditional ones, while they contained larger numbers of spots.

[20/33]

Regarding the quality of experiments (n = 2,139):

62.9% of experiments had a control, and 86.4% experiments had replicates.

41.2% of experiments had well-annotated cell types assigned.

[21/33]

Regarding the sparsity of the molecular data:

As expected, all sequencing-based spatial transcriptomics technologies showed high data sparsity.

[22/33]

Data exploration

SODB has 4 data exploration views:

- Expression view (including statistics of zoomed-in regions of interest)
- Annotation view (view by property, s.a. cell type, also zoom-in possible)
- Comparison view (compare expression of genes in space)
- SOView

[23/33]

SOView

In many examples, the SOView map nicely displays spatial patterns: symmetry structure & better cell type identification than annotation maps.

SOView requires no a-priori knowledge of the tissue or manually or computationally selecting important molecular features [24/33]

The authors conclude that (at least in the examples discussed) SOView is very suitable as a quick visualization tool: its colors are more meaningful than even the colors of the assigned cell-type map (which also requires parameter tuning and manual labeling).

[25/33]

In a Stereo-Seq dataset of mouse embryonic development, SOView can not only differentiate different organs with discriminative colors, but also finds subcompartments inside individual organs, such as brain, heart, liver, lung and pancreas.

[26/33]

@10xGenomics

On a @10xGenomics Visium spatial transcriptomics dataset of the dorsolateral prefrontal cortex, SOView shows a more intuitive global view than the cell annotation map.

SOView reveals best the expression continuity & gradient nature of the cerebral cortex.

[27/33]

On the Allen brain map, the paper further compares different methods for characterizing tissue structure: Louvain clustering, BayesSpace (clustering of expression & spatial location), SpaGCN (clustering of expression, spatial location & histology) and SOView.

[28/33]

The authors devise a score to illustrate how well each method characterizes different regions of the brain.

According to this metric, SOView has superior performance over existing methods, regardless of their clustering complexity.

[29/33]

Through its versatility and large volume of available data, SODB can support advances in computational spatial methods development, which likely requires & can benefit from well-annotated datasets for benchmarking.

[30/33]

Finally, the authors browsed the spatial transcriptomics literature and found 68 relevant spatial transcriptomics methods.

The Visium sample data provided by the 10X Genomics website was the most widely used dataset, followed by earlier and well-organized datasets.

[31/33]

TL;DR:

(1) SODB is a web-based platform combining large-scale data deposition & exploration for spatial transcriptomics, proteomics, metabolomics, genomics & multi-omics, into a convenient data format.

(2) SOView is novel interactive visualization & analysis module.

[32/33]

@naturemethods

Finally, here is the link to the @naturemethods paper

nature.com/articles/s4159…

The SODB website is gene.ai.tencent.com/SpatialOmics/

The command-line package is available at pysodb.readthedocs.io/en/latest/

FIN 🧵 [33/33]

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @simocristea

Simona Cristea

@simocristea

Jul 7, 2025

scRNAseq cell type annotation is notoriously messy. Despite so many algorithms, most researchers still rely on manual annotations using marker genes

In a new preprint accepted at ICML GenAI Bio Workshop, we ask if reasoning LLMs (DeepSeek-R1) can help with cell type annotation🧵

Driven by @samwang36 & @RunziTan97745 & with @BoWang87, we benchmarked DeepSeek-R1-0528 on zero-shot scRNAseq cell type annotation against non-reasoning LLMs, classifiers & foundation models . What we found surprised us👇biorxiv.org/content/10.110…

Our reasoning (!) for looking into scRNAseq cell type annotation was the observation that it is a very dynamic process: despite access to so many algorithms, biomedical groups still annotate scRNAseq data manually, in an iterative process of knowledge retrieval & data assessment.

Read 37 tweets

Simona Cristea

@simocristea

Feb 1, 2025

Impressive advancement in Computational Pathology.

A new multimodal foundation model by @AI4Pathology trained on 47,000 paired histology & genomics, which beautifully shows the multi-modal power of images & DNA & RNA

Even though patient genomic data is rare, it's so powerful 🧵

First, why is this model so important?

To my view, THREADS is the closest we have today to a cancer-level patient-centric foundation model.

It beautifully integrates lots of images, DNA & RNA - 3 data modalities providing critical orthogonal information about cancerous tissues

For some background:

Computational Pathology has been really revolutionized by Deep Learning (arguably like no other cancer-related field).

It turns out that the usual slides that pathologists read to diagnose & investigate tumors are very "learnable"

developer.nvidia.com/blog/whole-sli…

Read 19 tweets

Simona Cristea

@simocristea

Jan 22, 2025

Many people wonder what is the scientific evidence behind what @sama & Larry Ellison said today
at The White House: that AI will cure cancer.

Truth is that this is not a hype. The potential of AI to accelerate cancer discoveries like never before is enormous.

Here’s why🧵

https://twitter.com/simocristea/status/1879923278770241838

To start with: cancer is a very difficult problem. Funded with several billion dollars from the US government alone over the past few years, cancer survival has only marginally improved & incidence is actually increasing in younger people. That’s not good, in fact it’s really bad

https://twitter.com/simocristea/status/1879923278770241838

https://twitter.com/simocristea/status/1664389835036172288

Why is this though? Why haven’t we been able to cure cancer?

It’s because cancer is a very adaptable disease.

It’s not that we don’t have treatments for cancer. We have hundreds of them.

But tumors are versatile. They change states often during treatments & become resistant.

https://twitter.com/simocristea/status/1664389835036172288

Read 17 tweets

Simona Cristea

@simocristea

Jan 16, 2025

Cancer statistics in 2025 🇺🇸- new report 🚨

Cancer is becoming a new disease.

We're seeing a fundamental shift in who gets cancer, moving from a predominantly male, elderly disease to one that increasingly affects women and younger people.

Key highlights 🧵

1. New Cancer Cases and Deaths in 2025:

An estimated 2,041,910 new cancer cases will be diagnosed.

Approximately 618,120 cancer deaths will occur.

This equals about 5,600 new cases and 1,700 deaths per day.

2. Progress:

Cancer death rates have declined continuously through 2022, preventing nearly 4.5 million deaths since 1991.

This progress is attributed to reduced smoking, earlier detection, and improved treatments.

Read 10 tweets

Simona Cristea

@simocristea

Dec 31, 2023

To end 2023, I’ll share one of the most insightful & well-written papers I read in 2023.

This study @Nature links *spatial* tumor organization to immunotherapy response in breast cancer.

Immunotherapy is our strongest weapon against cancer. We need to understand it better.
🧵🧵

Long thread ahead, going deep into the molecular workings of breast cancer immunotherapy.

TL;DR:
1. Cancer–immune interactions & proliferative fractions predict immunotherapy response
2. Both pre-treatment & on-treatment predictors
3. Immunotherapy remodels the microenvironment

The paper is about triple-negative breast cancer (TNBC).

TNBC lacks ER & PR hormone receptors and human epidermal growth factor 2 (HER2) activity.

It is the most aggressive of the 4 breast cancer subtypes.

Responds poorest to treatment & has higher prevalence in younger women.

Read 41 tweets

Simona Cristea

@simocristea

Oct 31, 2023

New: the monthly roller-coaster through October’s coolest life science papers is here 🚀🧬

3-sentence summaries of papers on evolution, single cell methodologies, genetic screens & more.

And, only for October, an educational video on fighting cancer🤺 as a bonus.

Enjoy 3x10!

1. Assembly theory (Sharma et al., Nature)

The most (in)famous paper I read this month proposes a new framework (assembly theory, that is) to explain basically everything… or, more specifically, “to unify descriptions of evolutionary selection across physics and biology” 1/3

https://twitter.com/baym/status/1710815658890432679

This paper is not an easy read for anybody (in particular evolutionary biologists), but, to its merit, it sparked scientific discussions by being different than what is expected for a scientific paper describing evolution. 2/3

https://twitter.com/baym/status/1710815658890432679

Read 38 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Simona Cristea

Try unrolling a thread yourself!

More from @simocristea

Simona Cristea

Simona Cristea

Simona Cristea

Simona Cristea

Simona Cristea

Simona Cristea

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!