12,399 views

Thom Quinn

@tpq__

, 11 tweets, 5 min read

My Authors

@badlytrainedtec

@badlytrainedtec

Once again @badlytrainedtec and I have teamed up give you all a new method and R package for compositional data analysis.

How does it work? See below (1/N)

biorxiv.org/content/10.110…

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec CLR and ILR are both popular for compositional data analysis. These are really helpful, but there are limitations. They both use a log-ratio transform of geometric means, and so (a) fail when you have zeros and (b) are not necessarily as interpretable as you think (2/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec Consider the balance (A & B) vs. C. If B > C, you would expect that the mean of (A&B) is always larger than C. This is true for the arithmetic mean, but it isn't true for the geometric mean! For the geometric mean, (A&B) can be much smaller than C if A is rare! (3/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec We think this is a problem. But what can we do instead? We can amalgamate the data -- i.e., add components within the simplex. For example, we can compare (A + B) vs. C. This behaves exactly as you would expect. If B > C, then (A + B) is always larger than C. That's nice (4/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec Also, (A + B) vs. C is easy to define even if A=0. So why don't we always just amalgamate? Well, it turns out that ADDITION is actually a NON-LINEAR operator in the simplex. This means that the simple act of amalgamation can distort your data in unexpected ways. Uh oh! (5/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec Well, we can sort of get around this. How? We can define an OBJECTIVE FUNCTION to guide *how we amalgamate*. This objective function wants to preserve the structure of the data. Now, we can frame amalgamation as a search problem, and solve it with genetic algorithms (6/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec The amalgam package searches for the optimal amalgamation that best achieves the user-defined objective. This objective can be (a) to preserve the structure of the data [i.e., in terms of inter-sample distances] or (b) maximize prediction of a dependent variable (7/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec The software is painless to use, and distills your high-dimensional data into an N-part amalgamation that is easier to manage *without having to impute zeros*. When you request a 3- or 4-part amalgamation, you can visualize the data directly (8/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec We benchmark these 3- and 4-part amalgamations on 13 data sets, and we find that they can preserve the structure of the data as well as a PCA of the ILR coordinates! The difference is that each new variable in the amalgamated data is a simple sum. So easy to interpret! (9/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec We also show that 3-part amalgamations work very well as a dimension reduction method for supervised machine learning, and perform as well as the software selbal. selbal is a great tool, but it does rely on geometric means which may not be what you think they are (10/N)

@badlytrainedtec

@badlytrainedtec

@badlytrainedtec OK, I think that's enough for now. Check out the paper, check out the code, and DM me if you have questions.

Here's a simple example to get you going (11/11)

Enjoying this thread?

Keep Current with Thom Quinn

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

Related threads

Trending hashtags

Embed code for your website

Did Thread Reader help you today?