Once again, it's a lovely Thursday! Time for a #DBIOtweetorial by Kim Reynolds @kimreynolds_lab commissioned by the awesome folks at #engageDBIO! Let's get sciencing!
An organism’s genome encodes the rules for how it looks, grows, and responds to the environment in a series of “A”s, “C”s, “G”s and “T”s:
The genes encode proteins – molecular “parts” that assemble into cellular systems. For example, we often depict proteins in metabolism as lines that interconvert chemical species inside the cell. These diagrams contain a lot of information, but can be difficult to understand.
In 2021 we have a LOT of sequenced genomes – over 300K for bacteria alone – giving us the protein-based “parts list” for many organisms. (below is from 2013, Bertelli + Greub). So can we use all of this information to understand how proteins work together to make living things?
This problem is hard for two reasons: 1) while some genes seem to be REALLY critical, deleting or removing other genes has no detectable consequence 2) the effect of changing (mutating) a gene often depends on genomic context and the environment – a phenomenon called epistasis.
So we need some strategy to quantify the relative “importance” of genes, and to map couplings or interactions between genes (epistasis). STATISTICAL ANALYSIS OF GENOMES to the rescue!
The idea is to gather thousands of genomes from different organisms and start comparing them (quantitatively). We follow two general principles. First, we expect that genes CONSERVED across many organisms are IMPORTANT to core cellular functions.
Second, genes which COEVOLVE (they show correlated changes in sequence and/or chromosomal location) are more likely to INTERACT. This schematic (from Maddison + FitzJohn 2015) shows two genes undergoing correlated changes (from white to black) along an evolutionary tree.
Using these two principles, we can start to build statistical models for proteins, pathways, or even cells…
Recently, very exciting work has been happening at the level of single proteins: using statistical sequence analysis to predict protein structure and function (see also Martin Weigt, Anne-Florence Bitbol, Olivier Rivoire, @sokrypton, @MorcosLab, @deboramarks, among many others!)
So can we extrapolate these protein-based models up to bigger systems? And what drives co-evolution at the level of pathways and cells? Are statistical models sufficient to design large protein complexes, pathways, and more? Let’s get sciencing! #EngageDBIO #DBIOTweetorial
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.