LoΓ―c A. Royer πŸ’»πŸ”¬βš—οΈ πŸ‡ΊπŸ‡¦ Profile picture
Group Leader @czbiohub in San Francisco. #zfish #devbio #zebrahub #SelfSupervised #DeepLearning #Lightsheet #Optics co-creator of @napari_imaging

Jul 25, 2022, 18 tweets

πŸ“’ Very excited to see our work #cytoself now published in @naturemethods . #cytoself is a deep learning method for fully self-supervised protein localization profiling and clustering. Led by @liilii_tweet with @kchev @LeonettiManuel at @czbiohub

nature.com/articles/s4159…

1/n

Everything started with a conversation with my friend & colleague @LeonettiManuel who worked on building #OpenCellCZB β€” a map of the human proteome:

opencell.czbiohub.org

We wondered, can we use #DeepLearning to map the landscape of protein sub-cellular localization?

2/n

The problem with #images , compared to #sequences, is that it is unclear how to compare them. For example, how to estimate localization similarity from pairs of images of fluorescently labeled cells? With sequences we have algorithms and tools. But for images?

3/n

#cytoself takes in a large collection of images (millions of images) and learns a vectorial representation of localization that is robust against cell shape, size, and state variability. These representations can be used for clustering, comparisons, deorphaning, etc...

4/n

This is what we obtain when we apply the UMAP algorithm to these 'localization vectors' for each image in our collection. We can see lots of structure, with different levels of organization: from nuclear versus non-nuclear all the way down to stable protein complexes!

5/n

It looks quite nice also in 3D! Note the big grey area in towards the center? Those are images for proteins with mixed localizations.
Rendered by Hirofumi ( @liilii_tweet ) with @napari_imaging !

6/n

We see a gradual transition with different mixtures of localizations when traversing the space between cytoplasm and nucleoplasm localizations:

8/n

Can we dissect the features that make up these representations and interpret their meaning? To answer this question, we created a feature spectrum – as if each feature was an ingredient present in the images at different concentrations.

9/n

As demonstration, we used this to 'deorphan' a poorly characterized protein: FAM241A. The strongest correlation is 0.777 for ER, next is 0.08 for cytoplasm. We experimentally confirmed the ER localization of FAM241A by co-expression of a classical ER marker (SEC61B) !

10/n

How much does thus hold in general beyond this particular example? Can we predict the localization of each protein (cross-validation) ? It works quite well: For 96% of proteins the correct annotation is within the top 2 predictions, and for 99% it is within the top 3.

11/n

Does this generalize beyond #OpenCell data? We tried images from the @AllenInstitute Cell collection. And it works too! The same proteins seem to inhabit similar regions in our map, even for different cell types... (some differences are expected!)

12/n

If we dig deeper and look at protein complexes, it is remarkable that we can resolve many well known and stable complexes! This seems to suggest that images have enough information to infer protein interaction!?

13/n

These are pretty pictures, but can we quantify this?
Yes: the 'protein localization spectra' that we derive from our representations are effective at predicting shared complex membership: in 83.3% of cases the protein with the strongest correlation is in a shared
complex.

14/n

Finally, we show that #cytoself representations have more details and localization nuances than existing databases because, importantly, it is not derived from human knowledge or annotations, but from images alone! #cytoself discriminates between lysosomal and endosomal proteins.

Congrats to Hirofumi Kobayashi @liilii_tweet for a well deserved success! He worked extremely hard, and it shows! It was a pleasure to work with you on this Hiro!
And thanks to @kchev @LeonettiManuel for a fantastic collaboration!

16/n

Thanks also to @slschmid_CZB for mentorship and for proofreading and feedback on the manuscript, my whole team, in particular @_ahmetcansolak for help with coding, and to @finkd and Priscilla Chan for funding the @czbiohub .

17/n

Thanks for your interest, attention and for reading so far!

18/n

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling