This makes data sharing & re-analysis challenging.
Databases exist, but have limitations. [3/33]
At the moment, the 3 main spatial genomics databases are:
1. SpatialDB 2019: pioneer database with data browsing, downloading, gene comparison & spatial expression visualization. But it only provides raw data, which needs to be further transformed. [4/33] academic.oup.com/nar/article/48…
2. STOmicsDB 2022: improvement on SpatialDB regarding the range of spatial data types & the user interaction interface. It also provides visualization of biological features, such as gene distributions & spatial marker genes. [5/33]
3. SOAR 2022: covers a similar range of data types as STomicsDB. In addition, it also provides nice spatial analytical modules, doing, among others, spatially variable gene analysis or cell-type interaction analysis. [6/33]
3. Interactive display panel
- can be combined with SOView to automatically produce molecular markers for user-defined regions.
4. Command line package available
- much more efficient downloading of spatial data. [9/33]
In SODB, data are organized using a hierarchical tree with five levels: root, Biotech category, Biotechnology, Dataset and Experiment.
One dataset may consist of multiple replicates or control slices, each termed "experiment".
Experiments are the leaves of the tree. [10/33]
The data in each experiment consists of: 1. continuous molecular measurements (such as gene expression) in "spots" 2. spatial x-y coordinates of these spots
‼️ Important to note that spots don't mean single cells, rather spatial conglomerates of tens of cells.
[11/33]
The spatial data can be downloaded in a unified format for convenient interaction with downstream analytical pipelines such as Scanpy or Squidpy.
With this data format, cell-wise and feature-wise annotation are easily incorporated.
[12/33]
Here are all the 2,400 spatial datasets available in SODB.
[13/33]
The spatial datasets come from 7 species.
Mouse and human were the two most studied species, and consisted of 50.9% and 46.1% of all experiments.
[14/33]
Spatial transcriptomics (62.6% of all experiments) & proteomics (35.3%) were the main technologies used.
ST, the earliest spatial transcriptomics technology, made up 26.3% of experiments, followed by MERFISH (13.5%), the most used imaging-based spatial transcriptomics.
[15/33]
Different brain regions were among the most studied, including cortex regions.
Apart from neuroscience studies, other organs, such as liver & heart, were also preferred targets.
In cancer research, breast cancer & colorectal carcinoma were two prominent targets.
[16/33]
Human and mouse studies differed in the spatial technologies used.
Human: >50% of experiments were generated by spatial proteomics (MIBI, IMC & CODEX), with few spatial transcriptomics.
Mouse: almost all experiments were spatial transcriptomics (ST, Slide-Seq & Visium).
[17/33]
Among most spatial technologies, there existed a trade-off between number of spots & molecular features.
Spatial proteomics (blue circle) shows strength in finer spot resolution while suffering from limited (<100) protein multiplexing.
[18/33]
Classical spatial transcriptomics technologies s.a. 10XVisium & ST (red circle) shows high gene throughput and low number of spots.
Newer technologies, s.a. sciSpace, Slide-seqV2 & Stereo-seq (green circle) show improvements in spatial resolutions & spot throughput.
[19/33]
Another cluster of spatial transcriptomics datasets (mainly imaging-based technologies; yellow circle) had a smaller number of targeted genes compared with traditional ones, while they contained larger numbers of spots.
[20/33]
Regarding the quality of experiments (n = 2,139):
62.9% of experiments had a control, and 86.4% experiments had replicates.
41.2% of experiments had well-annotated cell types assigned.
[21/33]
Regarding the sparsity of the molecular data:
As expected, all sequencing-based spatial transcriptomics technologies showed high data sparsity.
[22/33]
Data exploration
SODB has 4 data exploration views:
- Expression view (including statistics of zoomed-in regions of interest)
- Annotation view (view by property, s.a. cell type, also zoom-in possible)
- Comparison view (compare expression of genes in space)
- SOView
[23/33]
SOView
In many examples, the SOView map nicely displays spatial patterns: symmetry structure & better cell type identification than annotation maps.
SOView requires no a-priori knowledge of the tissue or manually or computationally selecting important molecular features [24/33]
The authors conclude that (at least in the examples discussed) SOView is very suitable as a quick visualization tool: its colors are more meaningful than even the colors of the assigned cell-type map (which also requires parameter tuning and manual labeling).
[25/33]
In a Stereo-Seq dataset of mouse embryonic development, SOView can not only differentiate different organs with discriminative colors, but also finds subcompartments inside individual organs, such as brain, heart, liver, lung and pancreas.
[26/33]
On a @10xGenomics Visium spatial transcriptomics dataset of the dorsolateral prefrontal cortex, SOView shows a more intuitive global view than the cell annotation map.
SOView reveals best the expression continuity & gradient nature of the cerebral cortex.
[27/33]
On the Allen brain map, the paper further compares different methods for characterizing tissue structure: Louvain clustering, BayesSpace (clustering of expression & spatial location), SpaGCN (clustering of expression, spatial location & histology) and SOView.
[28/33]
The authors devise a score to illustrate how well each method characterizes different regions of the brain.
According to this metric, SOView has superior performance over existing methods, regardless of their clustering complexity.
[29/33]
Through its versatility and large volume of available data, SODB can support advances in computational spatial methods development, which likely requires & can benefit from well-annotated datasets for benchmarking.
[30/33]
Finally, the authors browsed the spatial transcriptomics literature and found 68 relevant spatial transcriptomics methods.
The Visium sample data provided by the 10X Genomics website was the most widely used dataset, followed by earlier and well-organized datasets.
[31/33]
TL;DR:
(1) SODB is a web-based platform combining large-scale data deposition & exploration for spatial transcriptomics, proteomics, metabolomics, genomics & multi-omics, into a convenient data format.
(2) SOView is novel interactive visualization & analysis module.
New: the monthly roller-coaster through October’s coolest life science papers is here 🚀🧬
3-sentence summaries of papers on evolution, single cell methodologies, genetic screens & more.
And, only for October, an educational video on fighting cancer🤺 as a bonus.
Enjoy 3x10!
1. Assembly theory (Sharma et al., Nature)
The most (in)famous paper I read this month proposes a new framework (assembly theory, that is) to explain basically everything… or, more specifically, “to unify descriptions of evolutionary selection across physics and biology” 1/3
This paper is not an easy read for anybody (in particular evolutionary biologists), but, to its merit, it sparked scientific discussions by being different than what is expected for a scientific paper describing evolution. 2/3
The human genome is gradually unravelling its secrets 🎁
AlphaMissense model @ScienceMagazine: one more path lit up by deep learning in exploring the code of life 🧬
We now know with high confidence if 89% of ALL missense variants are benign or pathogenic
Key contributions🧵🧵
First things first:
Missense variants = genetic variants (i.e DNA bases) that change the amino acid sequence (i.e groups of 3 bases, building blocks of proteins) in proteins.
Missense variants are more important than non-missense ones, as more likely to have functional impact.
Now, even if a variant changes the amino acid structure of a protein (i.e it is missense), it is not necessarily that the variant also impacts the function of its corresponding protein.
Further, even if protein function gets impacted, it isn't clear in which way or by how much.
- 500,000 cells with MIBI of 37 antibody panel
- 66 individuals (6-20 weeks gestation)
Immune tolerance model proposed for how the structure & function of the maternal endometrium transforms to promote the regulated invasion of genetically dissimilar fetal cells
Cancer is a terrible disease, and also one that we all know too well.
It is not a new problem, rather one that exists since thousands of years & is studied in unimaginable detail.
Then why do people still die of cancer?
Let's start understanding this by taking a step back.
It’s 1938, and Public Health Services are advising people that detecting and treating cancers early will save their lives.
Now fast-forward nowadays. We hear the exact same core message from the Public Health Services of our times, gradually and consistently backed up by more and more data.