We did it, @DaveZeevi! Our paper is now published in @ScienceMagazine! science.sciencemag.org/content/370/65…
We present a data-driven, computational perspective on how selective pressures resulting from nutrient limitation shape microbial coding sequences. Thread below:
We study ‘resource-driven’ selection using metagenomic and single-cell data of marine microbes, while adopting concepts common in statistical genetics like linear mixed models with variance components.
Using tailored algorithms, we partition the variance in selection metrics, calculated using marine microbes, and show that a significant portion of the selection is explained by the environment and is associated with nitrogen availability.
We further observed that mutations increasing the nitrogen requirements of cells are selected against. Nonetheless, we hypothesized that not all DNA mutations are exerted equal, with constraints imposed by the structure of the genetic code.
We thus defined a function quantifying the cost of a single mutation as the added number of amino acid atoms resulting from it and calculated the Expected Random Mutation Cost (ERMC) for the standard genetic code.
We found that the genetic code is minimizing ERMC with respect to nitrogen and carbon, as compared to 1 million hypothetical codes!! This robustness generalizes to multiple taxa across all domains of life, including the Human genome.
We devised a hierarchical model and found that this new optimization principle is of similar magnitude- and independent of- previously proposed optimization mechanisms of the genetic code.
This implies that the genetic code can be viewed as a buffer between the evolutionary forces of mutation and selection, the former occurring in DNA sequences and the latter predominantly in proteins.
And we were able to study all this using publicly available data on microbes!! How cool!
Our paper on Compositional Tensor Factorization (CTF) of microbial dynamics is now published in @NatureBiotech! nature.com/articles/s4158…
It might change how you analyze longitudinal microbiome data. Thread below:
In a cross-sectional study, you run a PCoA and look at the top PCs. But with temporal data, applying PCoA separately on each time point may mask important information that is carried over time.
For your longitudinal data analysis you should use CTF!