, 17 tweets, 4 min read Read on Twitter
If you’re interested in clustering single-cell RNA-Seq data, please check out my new paper “Straightforward clustering of single-cell RNA-Seq data with t-SNE and DBSCAN”! biorxiv.org/content/10.110…

Working PBMC example: github.com/flo-compbio/ga…

Quick Twitter summary below: 1/
I propose “Galapagos”, a clustering workflow designed to reliably separate the main cell types in scRNA-Seq data in a straightforward and transparent manner. 2/
The workflow consists of a simple series of steps and does not require gene selection. Essentially, clustering is performed by applying DBSCAN directly to t-SNE results. 3/
As there is some concern as to whether t-SNE is sufficiently robust for clustering purposes, I show that when t-SNE is applied to variance-stabilized data (as described in the paper), the results are indeed highly robust to parameter settings and initialization points. 4/
I then adopt a simulation approach to test whether Galapagos is able to overcome the high levels of technical noise in scRNA-Seq data and generate accurate clustering results. 5/
The results on the simulated data closely match the true cell type labels obtained from the “ground truth”, suggesting that Galapagos results are indeed accurate. 6/
To quantify clustering accuracy based on an experimental ground truth, I use CITE-Seq data to compare Galapagos results to cell type identities established based on protein expression markers. 7/
The results show that the accuracy (precision & recall) is >90% for the three cell types examined.
8/
Finally, I again use the CITE-Seq data to experimentally define different subsets of T cells. A comparison with Galapagos results suggest that Galapagos is able to distinguish between naive T cells, CD4+ memory T cells, and CD8+ memory T cells. 9/
However, distinguishing between CD4+ and CD8+ naive T cells is nearly impossible, as t-SNE fails to clearly separate these populations. 10/
In conclusion, Galapagos represents a straightforward approach for clustering single-cell RNA-Seq data that can be implemented in most programming languages with only a few lines of code. 11/
Since clustering is performed directly on the t-SNE results, the fine-tuning of the two DBSCAN parameters is very intuitive (“Do the clusters match my interpretation of the t-SNE plot?”) 12/
Clustering on the t-SNE results also makes the clustering results very easy to communicate. (“What you see is what you get.”) 13/
In the Discussion, I argue that the problems with t-SNE have more to do with the fact that there are so many different ways that the data are filtered and transformed *prior* to the application of t-SNE. In Galapagos, I therefore tried to simplify and standardize these steps. 14/
The main limitation of the method as I see it is the difficulty to distinguish between closely related cell types in cases where t-SNE fails to clearly separate the corresponding cell populations. 15/
In this case, my suggestion would be to apply denoising, and apply clustering to heatmaps of highly variable genes. (For more on this point, see our ENHANCE paper: biorxiv.org/content/10.110…) 16/
Please let me know if you have questions or suggestions for improvements. If you are potentially interested to serve as a reviewer for this paper for an open-access journal (e.g., F1000Research), please let me know as well. Thank you! (end of thread)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Florian Wagner
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!