, 10 tweets, 11 min read
My Authors
Read all threads
TWEETORIAL: PHATE, our dimensionality reduction and #dataviz method featured on the cover of today’s @NatureBiotech! To help spread the word, @scottgigante and @DBBurkhardt put together this #tweetorial for you. nature.com/articles/s4158… (1/10)
PHATE is a dimensionality reduction algorithm designed for visualizing all kinds of data. Here we show 16K differentiating stem cells measured with scRNA-seq. Unlike tSNE and UMAP, PHATE doesn’t create “blobs” and instead preserves continuous structures in the data (2/10) Image
Because it works so well on continuous data, people often ask if PHATE assumes trajectory structure. It doesn’t!

Here is PHATE on MNIST. It’s not as separated as tSNE, but neither are the digits in reality -- if you look at UMAP, you can find a 2 in the 8s cluster, etc. (3/10) Image
How does PHATE work?

PHATE represents data by diffusion probabilities (from diffusion maps) to *all* other points in the data. Next, we calculate an informational distance between these sets of diffusion probabilities and compress these relationships in 2D or 3D via MDS (4/10) Image
PHATE considers diffusion distances between all points, so continuous progressions are preserved.

In these linked clusters from @GoogleAI’s article pair-code.github.io/understanding-… by @_coenen & @adamrpearce, t-SNE and UMAP break up trajectories while PHATE keeps it all together (5/10) Image
@GoogleAI @_coenen @adamrpearce Much like t-SNE before it, UMAP tends to break datasets into discrete clusters, even when the dataset is connected. Here you see simulations from @AliciaOshlack and @_lazappi_ ’s Splatter tool -- PHATE alone retains the sparse connection along the green branch (6/10) Image
@GoogleAI @_coenen @adamrpearce @AliciaOshlack @_lazappi_ To quantify the visual advantage of PHATE, we developed Denoised Embedding Manifold Preservation (DEMaP) that measures how well embeddings of noisy data retain ground truth structure. PHATE outperforms all methods we tried on both trajectories (paths) and clusters (groups) (7/10) Image
@GoogleAI @_coenen @adamrpearce @AliciaOshlack @_lazappi_ To our surprise, even using Adjusted Rand Index (ARI), a metric designed specifically for clusters, PHATE still performed better than t-SNE and UMAP, visualizations whose algorithms are biased towards cluster structures (8/10) Image
@GoogleAI @_coenen @adamrpearce @AliciaOshlack @_lazappi_ PHATE is fast and scales to large datasets. It’s also incredibly parallelizable. We ran PHATE on @10xGenomics 1.3M Megacell Demonstration in just three hours on a 36-core workstation! (9/10) Image
If you like what you see, check out PHATE at github.com/KrishnaswamyLa…. It’s available in #rstats, #python and #MATLAB. Thanks for reading! (10/10)
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Krishnaswamy Lab

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!