There's definitely a strong signal of selection on Spike in #Omicron compared to reference clades in our preliminary RASCL analysis of ~60 sequences (thanks @aglucaci, more coming) 1. Spike is under positive selection 2. Spike is under stronger selection than background
There are 9 spike sites where there's stronger selection in #Omicron compared to other clades according to Contrast-FEL (academic.oup.com/mbe/article/38…). Sorted by q-value here (stronger evidence at the top)
Full details at observablehq.com/@aglucaci/sc2-… Will post further updates (we will be running daily or so updates as more sequences come in).
The selection is all on the basal branch to the clade. Will be very interesting to see how it continues to evolve (assuming the variant spreads). I would expect that some of the mutations (if indeed fixed due to intra-host selection) may revert.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/11 Can the evolutionary history of sarbecoviruses help predict the effect of mutations in #omicron? Experimental measurment of phenotypic effects is the gold standard (e.g. see the magnificent DMS-based predictions by @jbloom_lab). What about evolutionary predictions?
2/11 Obviously, if a mutation has been observed at appreciable frequencies in SARS-CoV-2 circulation, this provides evidence that it is not particularly deleterious or may be adaptive (at the time it was circulating, anyway).
3/11 How about mutations that have not been seen at "above noise" levels? We can look at evolution in related "species" (viral isolates in this case) to impute the effect of a mutation; this idea has found extensive use in general G2P (e.g. SIFT, PolyPhen, EP).
#SARSCoV2 selection analyses updates. We switched to running sliding windows analyses (blocks of 3 months) to deal with data volumes and get temporal trends. The current state of analyses is at observablehq.com/@spond/selecti…
This includes an at-a-glance view of selection profiles on the most recent time window
1/ A recent preprint (papers.ssrn.com/sol3/papers.cf…) reporting detection of sequence and antibody evidence for SARS-CoV-2 in Italy in the fall of 2019 presents results that are at odds with the current early SARS-CoV-2 timeline.
2/ It may be tempting to dismiss these results as false positives or some other data artifact (e.g.
), but should it be done for these “inconvenient" data?
3/ Or rather, should we think carefully how to examine the “early European spread” hypothesis by seeking early data more systematically (as the preprint calls for) and considering which alternative models might fit the totality of available early data?
The analysis of recovered sequences does not fundamentally change our current understanding of early SARS-CoV-2 evolution, but it does make the hypothesis of a single-source wet market outbreak implausible.
The rooting of the tree (i.e. what the progenitor sequence is) is also more likely in clade A, i.e. the Wu-1 genome is not the ancestral genome; simlilar to what we find in academic.oup.com/mbe/advance-ar…, and
An update on #SARSCoV2 selection analysis using @GISAID data (observablehq.com/@spond/natural…). I added a simple 5-category classification for each potential interesting site. One category = one point. The more points, the more interesting a site is.
Category 1. Is the site under selection using statistical comparative methods?
Category 2. Is there a large (>20%, which is incidentally what you can detect with mixed bases) fraction of minority alleles (synonymous or non-synonymous) among viral haplotypes at the site.
Category 3. Is there an upward trend over time in how many sequences carry a variant, i.e. do we see that variant frequency is increasing over time?
Category 4. Do we see multiple evolutionary events on the tree, i.e. more than one internal branch with selection?