We wondered, can we use #DeepLearning to map the landscape of protein sub-cellular localization?
2/n
The problem with #images , compared to #sequences, is that it is unclear how to compare them. For example, how to estimate localization similarity from pairs of images of fluorescently labeled cells? With sequences we have algorithms and tools. But for images?
3/n
#cytoself takes in a large collection of images (millions of images) and learns a vectorial representation of localization that is robust against cell shape, size, and state variability. These representations can be used for clustering, comparisons, deorphaning, etc...
4/n
This is what we obtain when we apply the UMAP algorithm to these 'localization vectors' for each image in our collection. We can see lots of structure, with different levels of organization: from nuclear versus non-nuclear all the way down to stable protein complexes!
5/n
It looks quite nice also in 3D! Note the big grey area in towards the center? Those are images for proteins with mixed localizations.
Rendered by Hirofumi ( @liilii_tweet ) with @napari_imaging !
6/n
We see a gradual transition with different mixtures of localizations when traversing the space between cytoplasm and nucleoplasm localizations:
8/n
Can we dissect the features that make up these representations and interpret their meaning? To answer this question, we created a feature spectrum – as if each feature was an ingredient present in the images at different concentrations.
9/n
As demonstration, we used this to 'deorphan' a poorly characterized protein: FAM241A. The strongest correlation is 0.777 for ER, next is 0.08 for cytoplasm. We experimentally confirmed the ER localization of FAM241A by co-expression of a classical ER marker (SEC61B) !
10/n
How much does thus hold in general beyond this particular example? Can we predict the localization of each protein (cross-validation) ? It works quite well: For 96% of proteins the correct annotation is within the top 2 predictions, and for 99% it is within the top 3.
11/n
Does this generalize beyond #OpenCell data? We tried images from the @AllenInstitute Cell collection. And it works too! The same proteins seem to inhabit similar regions in our map, even for different cell types... (some differences are expected!)
12/n
If we dig deeper and look at protein complexes, it is remarkable that we can resolve many well known and stable complexes! This seems to suggest that images have enough information to infer protein interaction!?
13/n
These are pretty pictures, but can we quantify this?
Yes: the 'protein localization spectra' that we derive from our representations are effective at predicting shared complex membership: in 83.3% of cases the protein with the strongest correlation is in a shared
complex.
14/n
Finally, we show that #cytoself representations have more details and localization nuances than existing databases because, importantly, it is not derived from human knowledge or annotations, but from images alone! #cytoself discriminates between lysosomal and endosomal proteins.
Congrats to Hirofumi Kobayashi @liilii_tweet for a well deserved success! He worked extremely hard, and it shows! It was a pleasure to work with you on this Hiro!
And thanks to @kchev@LeonettiManuel for a fantastic collaboration!
16/n
Thanks also to @slschmid_CZB for mentorship and for proofreading and feedback on the manuscript, my whole team, in particular @_ahmetcansolak for help with coding, and to @finkd and Priscilla Chan for funding the @czbiohub .
17/n
Thanks for your interest, attention and for reading so far!
📣New tutorial on how to use #Aydin — our easy-to-use and performant image #denoiser. We use one of our favorite test images: 'New York'. We go through different algorithms included in #Aydin and show how to use them, and how to set their parameters:
The question is how well can we denoise this image in the absence of any prior knowledge, ground-truth, or any other training images? Below is a crop with and without noise. Notice that the original image has lost of details: regular grids for windows, roof textures, etc...
2/n
We go through several algorithms, some only accessible through the new 'advanced mode'. If you like to tune parameters, be careful for what you wish for... We have a LOT of parameters in advanced mode...
So why should you care? Well if you already care about adjusting the brightness and contrast of your images, you should care about gamma correction, specially because it partly happens without your knowledge and often without your control. (2/n)
First I would like to point out that I am not talking about gamma correction in the context of image analysis or quantification. I am talking about gamma correction as it pertains to how your images are reproduced on screen or on paper (3/n)