Simona Cristea Profile picture
Feb 27 17 tweets 6 min read
I need to raise awareness about an important point in #scRNAseq data analysis, which, in my opinion, is not acknowledged enough:

‼️In practice, most cell type assignment methods will fail on totally novel cell types. Biological/expert curation is necessary!

Here's one example👇
Last year, together with @LabPolyak @harvardmed, we published a study in which we did something totally awesome: we experimentally showed how a TGFBR1 inhibitor drug 💊 prevents breast tumor initiation in two different rat models!

Here's a detailed thread on this paper:
As you can imagine, this is a big thing. Treating tumors is already hard, preventing them is even harder!

Obviously, the most burning question for us then became: what is the drug actually doing to prevent tumor initiation?

Or, what is different in treated vs. control cells?
Long story short: we identified a group of cells popping up/expanding after treatment in both strains (ACI & SD)

How do we know these cells are unique to treatment?

Because all other subpopulations were matched, except this one. So these cells are important.

But what are they?
An obvious thing to do is differential expression between these cells and various other relevant groups (s.a. all other cells).

We did that, and got back an interesting list, consisting of many mesenchymal and stem-cell markers, but most of which also characteristic of stroma!
‼️All cell type assignment methods we tried failed to characterize this population accurately. Most of them labeled it "stroma".

But, we weren't easily fooled.

@nellage has *tens of years* of experience with this experimental model & I spent *years* researching relevant papers.
Our teams spent >2 years investigating these cells.

We discussed hundreds of hours about them.

We embarked on costly & time-consuming experiments to dig deep.

(Science obsession at its best🤫)

All because we wanted to know for ourselves: is this really a new epithelial type??
So, how did we decide if this population is novel epithelium or stroma?

Two main strategies:

1. We knew the literature had evidence for extremely rare (<0.1%) progenitor mammary populations related to tumor initiation. We found lots of similarities with those populations.
‼️ 2. We actually did experiments to validate the epithelial (and not stroma!) nature of these cells. We found rare cells in the breast with both epithelial markers, as well as part of this subpopulation.

These experiments were tough because of the rarity of these cells.
All in all, we gathered substantial evidence (both computational & ultimately experimental) that this population is indeed a novel epithelial type, and not just stroma.

Why did then *none* of the multiple tools we applied signal the novel nature of this subpopulation?
When thinking about it, it's actually pretty straight-forward why.

There's absolutely no magic: cell assignment tools need references to match cells to, and assign based on max similarity with references.

Obviously, with a novel cell type, there's no reference to match it to.
Still, in such situations (unmatched to references), most methods claim to at least flag novel subpopulations.

But how about intermediate populations transitioning between cellular types, with context-dependent roles?

How about subpopulations very similar to other cell types?
The truth is that, in such cases, cell type assignment tools will fail, almost by definition. This expected behavior shouldn't surprise us.

This is why expert/biological knowledge is necessary & has authority over any algorithm

Also, biological validation is the ultimate proof!
‼️ I want to make it explicit that I am not claiming cell type assignment algorithms are not performing well.

I think such algorithms are nothing short of extraordinary.

It's just that they can't do all the work for us.

We also need to understand the biology behind our data.
The reason I am bringing this up is that, in my experience, it comes up repeatedly during discussions over concrete #scRNAseq datasets.

I've seen many #Bioinformatics analysts somehow reluctant to "override"/question the assignments of an algorithm.

That is missing the point.
As scientists, every decision we take needs to be justified.

Once justified and backed-up by evidence, it is valid.

Once the point is valid, it doesn't matter if algorithm X or method Y say differently.

We only want one thing: our claim to be TRUE, to the best of our knowledge
‼️Finally, actionable:

Understanding how cell type assignment algorithms actually work helps us also understand their limitations.

That's why my advice to #Bioinformatics folks is to read the papers behind *all* the algorithms they are applying.

(Sorry, no exceptions allowed!)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Simona Cristea

Simona Cristea Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @simocristea

Feb 23
🚨New #SpatialTranscriptomics #Bioinformatics data resource out in @naturemethods.

SODB, a platform with >2,400 manually curated spatial experiments from >25 spatial omics technologies & interactive analytical modules.

This🧵will walk you through all the features of SODB [1/33] Image
First, some background.

Spatial technologies complement classical genomics by also providing information about spatial context & tissue organization in:

- embriogenesis
- disease development
- normal tissue homeostasis

The field has exploded 🔥 in the past 2 years. [2/33] Image
But, data from different studies is stored in different configurations/repositories, such as:

- GEO
- zenodo
- fig share
- SingleCellPortal
- IONPath for MIBI
- 10XGenomics website

This makes data sharing & re-analysis challenging.

Databases exist, but have limitations. [3/33]
Read 33 tweets
Feb 10
Interested in how classical rule-based modular biology & #deeplearning fit together for the design of artificial proteins?

A new preprint combines these two modeling strategies to generate novel proteins!

Let's take a closer look at this innovative framework🧵👇
This method comes from the @MetaAI FAIR protein folks: @BrianHie, @salcandido, @ebetica, @OriKabeli, @proteinrosh, @nikismetanin, @TomSercu, @alexrives and is available as a preprint.

biorxiv.org/content/10.110…
The proposed methodology has 3 steps:

1. Define a generative program consisting of a syntax tree & a set of hierarchical constraints
2. Compile the program in (1) into an energy function
3. Optimize the function via simulated annealing. The solutions are the artificial proteins.
Read 20 tweets
Feb 2
🚨New milestone for #DeepLearning & life sciences in @NatureBiotech

Generating brand new functional proteins from scratch with large language models (e.g. #chatGPT)

Let’s understand this Transformers model used for protein design, how well it works & why this is important🧵👇 Image
The very nice paper discussed in this thread comes from a team led by @nikhil_ai at Salesforce @SFResearch 👏

It was available online as a preprint since 2021.

biorxiv.org/content/10.110…

nature.com/articles/s4158…
--Background--

Designing novel proteins carries enormous practical implications: from health to environment to food production, among many others.

Many research & industry groups do great work in this space, such as the Baker Lab @UW.

bakerlab.org
Read 22 tweets
Jan 31
Inspiring Symposium on Cancer Prevention @EACRnews

95% of cancer drugs fail. 94% do not improve life quality.

An ounce of prevention is worth a pound of cure. (B. Franklin)

Cancer prevention is tremendously difficult. But it is also necessary.

We need to shift our focus.
How to move from developing cancer treatments to cancer prevention? @cohen_cyrille

How to change the single gene/ single mutation paradigm for holistic approaches considering multi-omics, lifestyle, exposure and cells as a whole? @AzraRazaMD
How much does the environment matter? Can we prevent cancer by modulating exposure? @CBrisken

Which neoantigens to target? Shared or unique mutations? Overexpressed genes?

Will eliciting immune responses via vaccines help prevent tumors in high-risk populations? @emmyverschuren
Read 4 tweets
Jan 26
Graph Neural Networks (#GNNs) & their applications to life sciences are an exciting #DeepLearning area to discover!

But, to develop or apply GNN methods, we first need to understand the maths behind.

So, back to basics!

Here's a plain language summary of what's behind GNNs👇 Image
This summary is based on @PetarV_93’s recent paper with introductory theoretical notions on Graph Neural Networks.

This resource is very much an introductory one.

arxiv.org/abs/2301.08210
If you are already familiar with Graph Neural Networks, but still want to better understand the maths behind in a formalized logical framework, I recommend the following book/paper by @mmbronstein @joanbruna @TacoCohen @PetarV_93

arxiv.org/abs/2104.13478
Read 14 tweets
Jan 20
Division frenzy 🤩: T cells can divide indefinitely & long outlive their host organism!

One of 2023's most exciting papers so far!

A paper that challenges scientific paradigms & brings strong experimental evidence against long-held scientific beliefs.

Let's break it down🧵
Friends, this small 5-page @Nature paper is the perfect example of the ideal science:

1. Pick a very relevant topic (T cell adaptive immunity)
2. Ask a very relevant question related to this topic (how often can CD8+ T cells divide?)
👇
nature.com/articles/s4158…
3. Understand very well the current state of research (T cells have limited division potential)
4. Develop a hypothesis testing current state
5. Craft an accurate experiment to test it (passage same T cells for 10 years)
6. Investigate findings
7. Confirm/contradict hypothesis 🎁
Read 26 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(