Ryan Hisner Profile picture
Nov 24, 2023 25 tweets 9 min read Read on X
I spend a lot of time analyzing and documenting outlier SARS-CoV-2 sequences. Recently, I’ve noticed a fascinating pattern: the repeated appearance of paired mutations in two narrow NSP12 regions >2400 nucleotides distant from each other. 1/45
NSP12 is the RNA-dependent RNA polymerase (RdRp), part of the SARS-CoV-2 replication complex, which also includes NSP7-10 and NSP13-14. I discussed some basics of the RdRp & coronavirus genome replication previously. 2/45
The first two-thirds of the SARS-CoV-2 genome consists of ORF1a & ORF1b. When the full genome enters a ribosome, the cell protein-making machine, only ORF1a is translated ~75% of the time. The ribosome hits a stop codon & cuts loose. No ORF1b. 3/45 Image
But the other ~25% of the time, at the end of ORF1a, the ribosome gets tangled in a complex RNA structure called a pseudoknot. They say the ribosome “slips,” but to me it seems more like it gets stuck & momentarily stumbles backward. 4/45 Image
The RdRp copies the same nucleotide twice, shifting its reading frame. The stop codon is bypassed & ORF1b, which includes NSP12-16, is translated. Oddly, the first 1% or so of NSP12 is in ORF1a, with the rest in ORF1b. 5/45
Image
Image
I mention this because the first of the two NSP12 regions in which these paired mutations occur is in ORF1a. The region is ORF1a:4396-4399 (NSP12_4-7), with nearly all being at either ORF1:4396 or ORF1a:4398. 6/45 Image
The other region is ORF1b:820-824 (NSP12_829-833). These two regions are not particularly close to each other in the NSP12/RdRp complex as depicted in structural studies. But there has to be some sort of connection between them. 7/45 Image
Mutations in both regions are uncommon. In fact, using the legendary @ChaoranChen_‘s CovSpectrum, you can calculate how often we’d expect mutations in these regions to occur together if they had no connection to one another and occurred randomly. 8/45 Image
For simplicity, I’m restricting the analysis to ORF1a:4396 and ORF1a:4398, where the vast majority of these mutations are, though some mutations at 4394, 4395, & 4399 have been also paired with ORF1b:820-824 mutations. 9/45
I’ve analyzed 2020, 2021, 2022, & 2023 separately. I spent a long, long time excluding bad sequences from these lists, which is why the queries are complicated. Below are results & queries used for each year for sequences with ORF1a:4396 or ORF1a:4398 mutations. 10/45


Image
Image
Image
Image
And here are the results & queries used for each year for sequences with a mutation somewhere in ORF1b:820-824. I also filtered these results to exclude any sequence with an SNP clusters score of >99. 11/45


Image
Image
Image
Image
To accurately calculate the number of sequences one would expect to have mutations at both ORF1a:4396/4398 & ORF1b:820-824 (assuming that they occur randomly), you need to know the total number of sequences that have coverage of NSP12. 12/45
To exclude spike-only sequences & those lacking coverage in the NSP12 region, I searched for ORF1b:P314L for each year. This mutation has been universal since mid-2020 (with a handful of extremely interesting exceptions). 13/45 Image
Now it’s a simple matter to calculate (perhaps naively) how many sequences we would expect to have mutations in both regions each year. (% seq w/ORF1a:4396/4398)(% w/ORF1b:820-824)(total seq) = expected number of sequences = not very many.
14/45 Image
How does the actual number compare? Nothing unusual in 2020 (0 actual vs 1.54 expected) or 2021 (43.2 vs 32), but there seem to be too many sequences with both in 2022 (6.63 vs 95) & 2023 (1.45 vs 66)—i.e. the Omicron era. So this is an Omicron specialty. 15/45 Image
Now you could find thousands of pairs of mutations that are “overrepresented” using the same logic described above. Every decent-sized lineage would have several. It’s really the number of times the mutations were independently acquired that we’re interested in. 16/45
But it turns out that the majority of sequences with ORF1a:4396/4398 + ORF1b:820-824 mutations acquired them independently. I’ve looked at all these sequences individually, but you can tell quickly by looking at the designated lineages of these sequences. 17/45
The 95 sequences w/these paired mutations in 2022 came from 24 different lineages—setting a minimum number of times these mutation-pairs evolved independently. The true number is much larger than 24 since independent acquisition occurred within lineages (e.g. BA.2). 18/45 Image
Examining the Usher trees, I count 65 instances in 2022 (out of the 95 total) that evolved independently. Metadata show that 15/95 seq are 2nd, 3rd, or nth sequences from the same patient, though at least 3 branches involved different patients. 19/45
For the 2023 sequences, 37 of 66 sequences represent independent acquisitions of an ORF1a:4396/4398 + ORF1b:820-824 mutational pair. Furthermore, 11 sequences have mutations at *both* ORF1a:4396 & 4398. Some sample specimens depicted below. 20/45 Image
One case for which there are 8 sequences has ORF1a:P4389S, S4398L + ORF1b:D824N/N824S, ORF1b:S826A. Another has ORF1a:S4398L, F4399L + ORF1b:P821S, D824N. Combinations like this cannot be coincidental. 21/45 Image
Most sequences with these mutations are highly divergent, almost certainly originating in chronic infections. (The major exception is a BA.5.1.30 Honk Kong cluster of ~140 sequences). But why should this combo occur in chronic infections? 22/45
Chronic infections can result in rapid accumulation of mutations due to the lack of a transmission bottleneck (& other reasons). I think they also increase the chance of unlikely combinations of mutations. But that doesn’t explain why these mutations should be linked. 23/45
Is there any pattern to the types of mutations? ORF1a:S4398L and ORF1a:A4396V are by far the most common in that region. 24/45 Image
Both ORF1a:A4396V and ORF1a:S4398L result in a somewhat larger and more hydrophobic amino acid. 25/45
Image
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ryan Hisner

Ryan Hisner Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LongDesertTrain

Jan 2
Two quick notes on the state of chronic-infection SARS-CoV-2 seqs

#1) ~3 years after its peak, BA.1 is still showing up in nasal swab seqs—despite reduced surveillance—most recently a mid-late Dec BA.1 from Nebraska.

#2) Chronic JN.1 seqs now more common, w/1 peculiarity

1/12
While BA.1 still show up semi-regularly, pre-Omicron seqs are much rarer. Why? I think there are four major reasons, two obvious & two less obvious.

A) Time.
This one’s obvious: Over time, some chronic infections are cleared, while in other cases, the host dies.

2/12
B) Number of infections.

BA.1 infected more people, more quickly than any previous variant. More infections = more chances to establish long-term infection.
3/12 Image
Read 12 tweets
Dec 23, 2024
Fantastic review on chronic SARS-CoV-2 infections by virological superstars Richard Neher & Alex Sigal in Nature Microbiology. I’ll do a short overview, outline a couple minor quibbles, & defend the honor of ORF9b w/some stats & 3 striking sequences from the past week.
1/64 Image
First, let me say that this is well-written, extremely readable, and accessible to non-experts, so you should go read the full paper yourself, if you can find a way to access it. (Just realized it’s paywalled, ugh.) 2/64nature.com/articles/s4157…
Neher & Sigal focus on the 2 most important aspects of SARS-CoV-2 persistence: its relationship to Long Covid (including increased risk of adverse health events) & its vital importance to the evolution of SARS-CoV-2 variants. I’ll focus on the evolutionary aspects.
3/64 Image
Read 64 tweets
Dec 6, 2024
In SARS-2 evolution, amino acid (AA) mutations get the lion’s share of attention—& rightfully so, as noncoding & synonymous nucleotide muts—which cause no AA change‚ are mostly inconsequential. But there are many exceptions, including a possible new one I find intriguing. 1/30
I’ll discuss four categories of such “silent” mutations, two of which might be involved in the recent growth of one synonymous mutation.

#1. Kozak sequence changes
#2. Secondary RNA structure
#3. TRS destruction/improvement
#4. TRS creation 2/30
Maybe the single most remarkable example of convergent evolution in SARS-CoV-2 involves noncoding mutations: the multitude of muts in major variants that have pulverized the nucleocapsid (N) Kozak sequence.
I wrote about this below & a few other 🧵s 3/
Read 33 tweets
Nov 24, 2024
@SolidEvidence There was yet another paper this week describing someone chronically infected, with serious symptoms, but who repeatedly tested negative for everything with nasopharyngeal swabs. On bronchoalveolar lavage (BAL), they were Covid-positive. 1/ ijidonline.com/article/S1201-…Image
@SolidEvidence BAL is very rarely performed, yet there must be dozens of documented cases now where NP-swab PRC-negative patients who were very ill tested positive by BAL. This has to be way more common than we realize.

If we had a similar GI test, I imagine we'd find something similar. 2/
@SolidEvidence Importantly, the patient was treated and improved, likely clearing the virus for good. Many, maybe most, chronic infections could be treated and cleared. But they have to know they're infected for that to happen. 3/
Read 4 tweets
Nov 22, 2024
Superb thread here by @jbloom_lab that meshes well with what we've seen over the last few months in SARS-CoV-2 spike evolution: not much.

IMO, nothing significant has happened since the NTD-glycan-adding muts (T22N, ∆S31) & Q493E appeared. This 🧵 explains why. 1/6
Read full 🧵for explanation, but the short story is that the best apparent escape mutations all interact w/something else—like a nearby spike protomer or other important AA—making mutations there prohibitively costly.

In short, the virus has mutated itself into a corner. 2/6
It's very hard to effectively mutate out such a local fitness peak via stepwise mutation in circulation since multiple simultaneous muts might be required to reach a higher fitness peak. 3/6

Read 6 tweets
Nov 10, 2024
It's an interesting thought. I think the evidence is strong that all new, divergent variants have derived from chronic infections. The first wave of such variants—Alpha, Beta, Gamma—IMO involved chronic infections lasting probably ~5-7 months. It's controversial to say.... 1/15
…that Delta originated in a chronic infection, but I think the evidence that it did is strong. One characteristic of chronic-infection branches is a high rate of non-synonymous nucleotide (nuc) substitutions (subs)—i.e. ones that result in an amino acid (AA) change. 2/15 Image
For example, if 80% of nuc subs in coding regions cause an AA change, that’s a very high nonsynonymous rate. The branch leading to Delta has 17 AA changes—from just *15* nuc subs! That’s over 100%. How is this possible? 3/15
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(