Announcing TopOMetry, a dimensional reduction (DR) tool to learn #SingleCell data phenotypic topology. Finds >100 CD4 #Tcells in public blood data with very specific marker genes. You can use any method for visualization, such as MDE (@akshaykagrawal):

#bioinformatics

🧵1/16
Pre-print just now on BioRxiv: biorxiv.org/content/10.110…

I'm still learning a lot, as I did this mostly on my own during my clinical rounds in pandemic Brazil. If you spot any mistakes or have comments, please be kind to reach out. I plan on improving.

2/16
In a nutshell: dimensional reduction happens in 'steps' (i.g. kNN, affinity learning, decomposition, optimizing layouts). TopOMetry assumes phenotypic topology (i.e. manifold hypothesis) and approximates the Laplace-Beltrami Operator (LBO) at each of these steps.

3/16
The LBO carries all latent topological information from single-cell data, as the underlying manifold (i.e. an extended Waddington's landscape) maps the latent identity of single-cells to a phenotypic topology (see D, image source: doi.org/10.1002/bies.2…, cited).

4/16
The current implementation wraps 30 possible final visualizations from the same dataset. Instead of choosing between t-SNE or UMAP, test these and other methods with multiple options for learning a first orthogonal basis and a topological graph.

5/16
Because these models tend to converge to the LBO, different combinations often find similar results, which are reasonably well visualized by nearly all built-in layout methods. In all these models, a similar T CD4 is found, suggesting success in approximating the LBO.
6/16
These T CD4 clusters have significantly more specific marker genes than those obtained with the current default approach (PCA+fuzzy graph+Leiden). Across supplementary figures, we show this is similar across different TopOMetry models and many datasets.
7/16
In the unsorted human bone marrow (@humancellatlas), TopOMetry simultaneously represents the cell cycle of hematopoietic progenitors and the diversity of jointly sampled circulating T CD4 cells:
8/16
TopOMetry handles very large whole-organism data quite well. We show how with ~1.3M mouse organogenesis cells (@coletrapnell @JShendure) to reveal finer-grained lineages that would be otherwise mislabeled.
9/16
Its orthogonal bases also perform better in preserving the local structure than PCA, while also preserving most of the global structure.
10/16
Six key take-home messages:

1- Data can be distorted by linear methods used in most of the currently published results, and these should be carefully interpreted. Validation of downstream hypothesis is not a validation of the learned latent structure per se.
11/16
2- We should care more about representing these 'natural shapes' in single-cell data. I don't claim the methods included in TopOMetry are the best. Yet, I suggest harnessing LBO's powerful properties to geometrically describe cellular hierarchies and phenotypic diversity.
12/16
3- Most layout (aka visualization) algorithms perform fine, but this varies depending on their input (i.e. PCA, data, latent orthogonal bases). Instead of relying on a single method, you can check many to grasp different and shared insights on the data structure.
13/16
4- The community should be more careful regarding assumptions of single-cell data structure. I'm not saying 'ditch PCA' or 'ditch t-SNE/UMAP', I'm saying 'compare and combine them with topological models to better understand the data'.
14/16
5- Single-cell data computational analysis is the key limiting step for using these powerful data. 10X PBMC68k was published in early 2017, nearly 5 years ago, and since then widely used for tests and tutorials.

We have all analyzed PBMC data, have we not?

And yet.

15/16
6- Saying that DR is not reliable ('specious art', 'but the distortion') is not an excuse for not using it properly. All DR results are wrong, but some are useful. We'll never 'see' in 20,000 dimensions, and DR is, for now, the best way to understand high-dimensional data.

16/16
Again, a link to the pre-print: biorxiv.org/content/10.110…

GitHub: github.com/davisidarta/to…

TopOMetry is documented at ReadTheDocs: topometry.readthedocs.io/en/latest/
Note: I acknowledge only a fraction of the theory employed here is novel. This was achieved by standing on the shoulders of the broader community that developed early methods (i.g. @leland_mcinnes). I did not invent the array of methods used in TopOMetry.
Huge kudos
to everyone who provided their insights during this process: @leland_mcinnes @akshaykagrawal @hippopedoid @JShendure,

to those who unconditionally believed in my potential: @EbruErbayLab @helder_nakaya,

and to all the single-cell friends I made here on twitter!
Personal consideration:

I won't feel bad if someone proves me wrong later. The sole fact that I've grown and learned enough to do something like this is the single biggest reward I could get.

I did survive quite an impostor syndrome.
If you are still here, check out some digital generative art I did with these layouts (not science, just for the fun):

I call this one 'T CD4 blossom', aka 'flowers for mom'.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Davi Sidarta-Oliveira

Davi Sidarta-Oliveira Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(