#SARSCoV2 selection analyses updates. We switched to running sliding windows analyses (blocks of 3 months) to deal with data volumes and get temporal trends. The current state of analyses is at observablehq.com/@spond/selecti…
This includes an at-a-glance view of selection profiles on the most recent time window
Profiles of selective "forces"
And evolutionary history of any subset of genomic sites (here the metasignature from cell.com/cell/pdf/S0092…)
1/ A recent preprint (papers.ssrn.com/sol3/papers.cf…) reporting detection of sequence and antibody evidence for SARS-CoV-2 in Italy in the fall of 2019 presents results that are at odds with the current early SARS-CoV-2 timeline.
2/ It may be tempting to dismiss these results as false positives or some other data artifact (e.g.
), but should it be done for these “inconvenient" data?
3/ Or rather, should we think carefully how to examine the “early European spread” hypothesis by seeking early data more systematically (as the preprint calls for) and considering which alternative models might fit the totality of available early data?
The analysis of recovered sequences does not fundamentally change our current understanding of early SARS-CoV-2 evolution, but it does make the hypothesis of a single-source wet market outbreak implausible.
The rooting of the tree (i.e. what the progenitor sequence is) is also more likely in clade A, i.e. the Wu-1 genome is not the ancestral genome; simlilar to what we find in academic.oup.com/mbe/advance-ar…, and
An update on #SARSCoV2 selection analysis using @GISAID data (observablehq.com/@spond/natural…). I added a simple 5-category classification for each potential interesting site. One category = one point. The more points, the more interesting a site is.
Category 1. Is the site under selection using statistical comparative methods?
Category 2. Is there a large (>20%, which is incidentally what you can detect with mixed bases) fraction of minority alleles (synonymous or non-synonymous) among viral haplotypes at the site.
Category 3. Is there an upward trend over time in how many sequences carry a variant, i.e. do we see that variant frequency is increasing over time?
Category 4. Do we see multiple evolutionary events on the tree, i.e. more than one internal branch with selection?