The analysis of recovered sequences does not fundamentally change our current understanding of early SARS-CoV-2 evolution, but it does make the hypothesis of a single-source wet market outbreak implausible.
The rooting of the tree (i.e. what the progenitor sequence is) is also more likely in clade A, i.e. the Wu-1 genome is not the ancestral genome; simlilar to what we find in academic.oup.com/mbe/advance-ar…, and
We should be prepared however, to revise these ideas and hypotheses further if and when more early sequence data emerge. I would not be surprised if these revisions are very significant (e.g. the timing of introduction).
• • •
Missing some Tweet in this thread? You can try to
force a refresh
An update on #SARSCoV2 selection analysis using @GISAID data (observablehq.com/@spond/natural…). I added a simple 5-category classification for each potential interesting site. One category = one point. The more points, the more interesting a site is.
Category 1. Is the site under selection using statistical comparative methods?
Category 2. Is there a large (>20%, which is incidentally what you can detect with mixed bases) fraction of minority alleles (synonymous or non-synonymous) among viral haplotypes at the site.
Category 3. Is there an upward trend over time in how many sequences carry a variant, i.e. do we see that variant frequency is increasing over time?
Category 4. Do we see multiple evolutionary events on the tree, i.e. more than one internal branch with selection?