Simona Cristea Profile picture
Feb 23 33 tweets 11 min read
🚨New #SpatialTranscriptomics #Bioinformatics data resource out in @naturemethods.

SODB, a platform with >2,400 manually curated spatial experiments from >25 spatial omics technologies & interactive analytical modules.

This🧵will walk you through all the features of SODB [1/33] Image
First, some background.

Spatial technologies complement classical genomics by also providing information about spatial context & tissue organization in:

- embriogenesis
- disease development
- normal tissue homeostasis

The field has exploded 🔥 in the past 2 years. [2/33] Image
But, data from different studies is stored in different configurations/repositories, such as:

- GEO
- zenodo
- fig share
- SingleCellPortal
- IONPath for MIBI
- 10XGenomics website

This makes data sharing & re-analysis challenging.

Databases exist, but have limitations. [3/33]
At the moment, the 3 main spatial genomics databases are:

1. SpatialDB 2019: pioneer database with data browsing, downloading, gene comparison & spatial expression visualization. But it only provides raw data, which needs to be further transformed. [4/33]
academic.oup.com/nar/article/48…
2. STOmicsDB 2022: improvement on SpatialDB regarding the range of spatial data types & the user interaction interface. It also provides visualization of biological features, such as gene distributions & spatial marker genes. [5/33]

biorxiv.org/content/10.110…
3. SOAR 2022: covers a similar range of data types as STomicsDB. In addition, it also provides nice spatial analytical modules, doing, among others, spatially variable gene analysis or cell-type interaction analysis. [6/33]

biorxiv.org/content/10.110…
Back to SODB.

At a glance, how does it contribute?

1. it's a large repository for downloading spatial data: transcriptomics, proteomics, metabolomics, genomics & multiomics

2. it provides superior interactive data exploration through the module Spatial Omics View SOView [7/33]
SODB has 4 unique features.

1. Spatial datasets:
- more data available than in the other spatial databases
- wide range of spatial technologies

2. Interactive visualization module SOView
- quickly previews global tissue structure
- identifies subtle tissue substructures. [8/33]
3. Interactive display panel
- can be combined with SOView to automatically produce molecular markers for user-defined regions.

4. Command line package available
- much more efficient downloading of spatial data. [9/33]
In SODB, data are organized using a hierarchical tree with five levels: root, Biotech category, Biotechnology, Dataset and Experiment.

One dataset may consist of multiple replicates or control slices, each termed "experiment".

Experiments are the leaves of the tree. [10/33] Image
The data in each experiment consists of:
1. continuous molecular measurements (such as gene expression) in "spots"
2. spatial x-y coordinates of these spots

‼️ Important to note that spots don't mean single cells, rather spatial conglomerates of tens of cells.

[11/33]
The spatial data can be downloaded in a unified format for convenient interaction with downstream analytical pipelines such as Scanpy or Squidpy.

With this data format, cell-wise and feature-wise annotation are easily incorporated.

[12/33]
Here are all the 2,400 spatial datasets available in SODB.

[13/33] Image
The spatial datasets come from 7 species.

Mouse and human were the two most studied species, and consisted of 50.9% and 46.1% of all experiments.

[14/33] Image
Spatial transcriptomics (62.6% of all experiments) & proteomics (35.3%) were the main technologies used.

ST, the earliest spatial transcriptomics technology, made up 26.3% of experiments, followed by MERFISH (13.5%), the most used imaging-based spatial transcriptomics.

[15/33] Image
Different brain regions were among the most studied, including cortex regions.

Apart from neuroscience studies, other organs, such as liver & heart, were also preferred targets.

In cancer research, breast cancer & colorectal carcinoma were two prominent targets.

[16/33] Image
Human and mouse studies differed in the spatial technologies used.

Human: >50% of experiments were generated by spatial proteomics (MIBI, IMC & CODEX), with few spatial transcriptomics.

Mouse: almost all experiments were spatial transcriptomics (ST, Slide-Seq & Visium).
[17/33] Image
Among most spatial technologies, there existed a trade-off between number of spots & molecular features.

Spatial proteomics (blue circle) shows strength in finer spot resolution while suffering from limited (<100) protein multiplexing.

[18/33] Image
Classical spatial transcriptomics technologies s.a. 10XVisium & ST (red circle) shows high gene throughput and low number of spots.

Newer technologies, s.a. sciSpace, Slide-seqV2 & Stereo-seq (green circle) show improvements in spatial resolutions & spot throughput.

[19/33]
Another cluster of spatial transcriptomics datasets (mainly imaging-based technologies; yellow circle) had a smaller number of targeted genes compared with traditional ones, while they contained larger numbers of spots.

[20/33]
Regarding the quality of experiments (n = 2,139):

62.9% of experiments had a control, and 86.4% experiments had replicates.

41.2% of experiments had well-annotated cell types assigned.

[21/33] Image
Regarding the sparsity of the molecular data:

As expected, all sequencing-based spatial transcriptomics technologies showed high data sparsity.

[22/33] Image
Data exploration

SODB has 4 data exploration views:

- Expression view (including statistics of zoomed-in regions of interest)
- Annotation view (view by property, s.a. cell type, also zoom-in possible)
- Comparison view (compare expression of genes in space)
- SOView

[23/33] Image
SOView

In many examples, the SOView map nicely displays spatial patterns: symmetry structure & better cell type identification than annotation maps.

SOView requires no a-priori knowledge of the tissue or manually or computationally selecting important molecular features [24/33] Image
The authors conclude that (at least in the examples discussed) SOView is very suitable as a quick visualization tool: its colors are more meaningful than even the colors of the assigned cell-type map (which also requires parameter tuning and manual labeling).

[25/33]
In a Stereo-Seq dataset of mouse embryonic development, SOView can not only differentiate different organs with discriminative colors, but also finds subcompartments inside individual organs, such as brain, heart, liver, lung and pancreas.

[26/33] Image
On a @10xGenomics Visium spatial transcriptomics dataset of the dorsolateral prefrontal cortex, SOView shows a more intuitive global view than the cell annotation map.

SOView reveals best the expression continuity & gradient nature of the cerebral cortex.

[27/33] Image
On the Allen brain map, the paper further compares different methods for characterizing tissue structure: Louvain clustering, BayesSpace (clustering of expression & spatial location), SpaGCN (clustering of expression, spatial location & histology) and SOView.

[28/33] Image
The authors devise a score to illustrate how well each method characterizes different regions of the brain.

According to this metric, SOView has superior performance over existing methods, regardless of their clustering complexity.

[29/33] Image
Through its versatility and large volume of available data, SODB can support advances in computational spatial methods development, which likely requires & can benefit from well-annotated datasets for benchmarking.

[30/33] Image
Finally, the authors browsed the spatial transcriptomics literature and found 68 relevant spatial transcriptomics methods.

The Visium sample data provided by the 10X Genomics website was the most widely used dataset, followed by earlier and well-organized datasets.

[31/33] Image
TL;DR:

(1) SODB is a web-based platform combining large-scale data deposition & exploration for spatial transcriptomics, proteomics, metabolomics, genomics & multi-omics, into a convenient data format.

(2) SOView is novel interactive visualization & analysis module.

[32/33]
Finally, here is the link to the @naturemethods paper

nature.com/articles/s4159…

The SODB website is gene.ai.tencent.com/SpatialOmics/

The command-line package is available at pysodb.readthedocs.io/en/latest/

FIN 🧵 [33/33] Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Simona Cristea

Simona Cristea Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @simocristea

Feb 10
Interested in how classical rule-based modular biology & #deeplearning fit together for the design of artificial proteins?

A new preprint combines these two modeling strategies to generate novel proteins!

Let's take a closer look at this innovative framework🧵👇
This method comes from the @MetaAI FAIR protein folks: @BrianHie, @salcandido, @ebetica, @OriKabeli, @proteinrosh, @nikismetanin, @TomSercu, @alexrives and is available as a preprint.

biorxiv.org/content/10.110…
The proposed methodology has 3 steps:

1. Define a generative program consisting of a syntax tree & a set of hierarchical constraints
2. Compile the program in (1) into an energy function
3. Optimize the function via simulated annealing. The solutions are the artificial proteins.
Read 20 tweets
Feb 2
🚨New milestone for #DeepLearning & life sciences in @NatureBiotech

Generating brand new functional proteins from scratch with large language models (e.g. #chatGPT)

Let’s understand this Transformers model used for protein design, how well it works & why this is important🧵👇 Image
The very nice paper discussed in this thread comes from a team led by @nikhil_ai at Salesforce @SFResearch 👏

It was available online as a preprint since 2021.

biorxiv.org/content/10.110…

nature.com/articles/s4158…
--Background--

Designing novel proteins carries enormous practical implications: from health to environment to food production, among many others.

Many research & industry groups do great work in this space, such as the Baker Lab @UW.

bakerlab.org
Read 22 tweets
Jan 31
Inspiring Symposium on Cancer Prevention @EACRnews

95% of cancer drugs fail. 94% do not improve life quality.

An ounce of prevention is worth a pound of cure. (B. Franklin)

Cancer prevention is tremendously difficult. But it is also necessary.

We need to shift our focus.
How to move from developing cancer treatments to cancer prevention? @cohen_cyrille

How to change the single gene/ single mutation paradigm for holistic approaches considering multi-omics, lifestyle, exposure and cells as a whole? @AzraRazaMD
How much does the environment matter? Can we prevent cancer by modulating exposure? @CBrisken

Which neoantigens to target? Shared or unique mutations? Overexpressed genes?

Will eliciting immune responses via vaccines help prevent tumors in high-risk populations? @emmyverschuren
Read 4 tweets
Jan 26
Graph Neural Networks (#GNNs) & their applications to life sciences are an exciting #DeepLearning area to discover!

But, to develop or apply GNN methods, we first need to understand the maths behind.

So, back to basics!

Here's a plain language summary of what's behind GNNs👇 Image
This summary is based on @PetarV_93’s recent paper with introductory theoretical notions on Graph Neural Networks.

This resource is very much an introductory one.

arxiv.org/abs/2301.08210
If you are already familiar with Graph Neural Networks, but still want to better understand the maths behind in a formalized logical framework, I recommend the following book/paper by @mmbronstein @joanbruna @TacoCohen @PetarV_93

arxiv.org/abs/2104.13478
Read 14 tweets
Jan 20
Division frenzy 🤩: T cells can divide indefinitely & long outlive their host organism!

One of 2023's most exciting papers so far!

A paper that challenges scientific paradigms & brings strong experimental evidence against long-held scientific beliefs.

Let's break it down🧵
Friends, this small 5-page @Nature paper is the perfect example of the ideal science:

1. Pick a very relevant topic (T cell adaptive immunity)
2. Ask a very relevant question related to this topic (how often can CD8+ T cells divide?)
👇
nature.com/articles/s4158…
3. Understand very well the current state of research (T cells have limited division potential)
4. Develop a hypothesis testing current state
5. Craft an accurate experiment to test it (passage same T cells for 10 years)
6. Investigate findings
7. Confirm/contradict hypothesis 🎁
Read 26 tweets
Jan 17
The science of #immunotherapy can cure a patient's otherwise incurable cancer.

But sometimes immunotherapy fails completely

Shockingly, we hardly know why.

A meta-analysis of #Genomics & #Transcriptomics in >1,000 immunotherapy-treated patients aims to better understand why🧵
This 2021 @CellCellPress paper is one of the best #DataScience #Bioinformatics resources out there for understanding the genetic determinants of response to immune checkpoint inhibitors (ICIs).

cell.com/cell/fulltext/…
Some context:

PD-1 & PD-L1 inhibitors are examples of ICIs.

ICI is a type of immunotherapy that un-blocks the immune system & allows it to mount attacks🤺

It does it by inhibiting checkpoints (s.a. PD-1 & PD-L1): proteins that keep the immune system from attacking its own self
Read 28 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(