I sometimes see #SingleCell papers where the authors treat some dimensionality reduced plot as ground truth. Here's a quick, simple example showing why that can be problematic. 1/n
Here are three random distributions plotted at the vertices of an equilateral triangle. As you can see, the mean distance from any two clusters is equal. 2/n
Right now we're using two dimensions. What if we want to reduce down to 1-D? Well, one simple solution would be to just get rid of the y values. The distance between cluster 3 & 1, and 3 & 2 are about equal, but now 1 & 2 are further apart. Not great. 3/n
What about something fancier? Here is the first principle component of a PCA run on our two dimensions. Clusters 1 & 3 are almost on top of each other now and none of the distances are equal. Also not great. 4/n
What about something EVEN FANCIER? UMAP has an....interesting solution. Again, the distances are completely unequal and now we've split up one of our clusters. Really not great. 5/n
So why did they all fail? Because I set up an impossible problem. There is no way to preserve the information I encoded in 2-D down to 1-D. That isn't to say that dimensionality reduction is always terrible, but it ALWAYS loses information. This is important to keep in mind! 6/n
I think dimensionality reduction is great and I use it in all my papers, but it's not ground truth! Your high-d dataset has more information than your reduced dimensions, so don't throw that away! Analyze with hi-D; visualize with 2-D. Thanks for coming to my TED talk. n/n
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Our single cell atlas of human B cells is out in @ImmunityCP! We screened the B cell surface proteome and then followed up with functional analyses. We identify twelve populations across four lymphoid tissues. cell.com/immunity/fullt… 1/16
Using a multiplexed mass cytometry approach, we screened the expression of 351 surface molecules on human B cells, mostly looking at CD markers and other proteins associated with immunological function or signaling. 2/16
We identified 98 surface molecules expressed by human B cells and evaluated their expression on canonical B cell gates. 3/16