Ryan Hisner Profile picture
Nov 24, 2023 25 tweets 9 min read Read on X
I spend a lot of time analyzing and documenting outlier SARS-CoV-2 sequences. Recently, I’ve noticed a fascinating pattern: the repeated appearance of paired mutations in two narrow NSP12 regions >2400 nucleotides distant from each other. 1/45
NSP12 is the RNA-dependent RNA polymerase (RdRp), part of the SARS-CoV-2 replication complex, which also includes NSP7-10 and NSP13-14. I discussed some basics of the RdRp & coronavirus genome replication previously. 2/45
The first two-thirds of the SARS-CoV-2 genome consists of ORF1a & ORF1b. When the full genome enters a ribosome, the cell protein-making machine, only ORF1a is translated ~75% of the time. The ribosome hits a stop codon & cuts loose. No ORF1b. 3/45 Image
But the other ~25% of the time, at the end of ORF1a, the ribosome gets tangled in a complex RNA structure called a pseudoknot. They say the ribosome “slips,” but to me it seems more like it gets stuck & momentarily stumbles backward. 4/45 Image
The RdRp copies the same nucleotide twice, shifting its reading frame. The stop codon is bypassed & ORF1b, which includes NSP12-16, is translated. Oddly, the first 1% or so of NSP12 is in ORF1a, with the rest in ORF1b. 5/45
Image
Image
I mention this because the first of the two NSP12 regions in which these paired mutations occur is in ORF1a. The region is ORF1a:4396-4399 (NSP12_4-7), with nearly all being at either ORF1:4396 or ORF1a:4398. 6/45 Image
The other region is ORF1b:820-824 (NSP12_829-833). These two regions are not particularly close to each other in the NSP12/RdRp complex as depicted in structural studies. But there has to be some sort of connection between them. 7/45 Image
Mutations in both regions are uncommon. In fact, using the legendary @ChaoranChen_‘s CovSpectrum, you can calculate how often we’d expect mutations in these regions to occur together if they had no connection to one another and occurred randomly. 8/45 Image
For simplicity, I’m restricting the analysis to ORF1a:4396 and ORF1a:4398, where the vast majority of these mutations are, though some mutations at 4394, 4395, & 4399 have been also paired with ORF1b:820-824 mutations. 9/45
I’ve analyzed 2020, 2021, 2022, & 2023 separately. I spent a long, long time excluding bad sequences from these lists, which is why the queries are complicated. Below are results & queries used for each year for sequences with ORF1a:4396 or ORF1a:4398 mutations. 10/45


Image
Image
Image
Image
And here are the results & queries used for each year for sequences with a mutation somewhere in ORF1b:820-824. I also filtered these results to exclude any sequence with an SNP clusters score of >99. 11/45


Image
Image
Image
Image
To accurately calculate the number of sequences one would expect to have mutations at both ORF1a:4396/4398 & ORF1b:820-824 (assuming that they occur randomly), you need to know the total number of sequences that have coverage of NSP12. 12/45
To exclude spike-only sequences & those lacking coverage in the NSP12 region, I searched for ORF1b:P314L for each year. This mutation has been universal since mid-2020 (with a handful of extremely interesting exceptions). 13/45 Image
Now it’s a simple matter to calculate (perhaps naively) how many sequences we would expect to have mutations in both regions each year. (% seq w/ORF1a:4396/4398)(% w/ORF1b:820-824)(total seq) = expected number of sequences = not very many.
14/45 Image
How does the actual number compare? Nothing unusual in 2020 (0 actual vs 1.54 expected) or 2021 (43.2 vs 32), but there seem to be too many sequences with both in 2022 (6.63 vs 95) & 2023 (1.45 vs 66)—i.e. the Omicron era. So this is an Omicron specialty. 15/45 Image
Now you could find thousands of pairs of mutations that are “overrepresented” using the same logic described above. Every decent-sized lineage would have several. It’s really the number of times the mutations were independently acquired that we’re interested in. 16/45
But it turns out that the majority of sequences with ORF1a:4396/4398 + ORF1b:820-824 mutations acquired them independently. I’ve looked at all these sequences individually, but you can tell quickly by looking at the designated lineages of these sequences. 17/45
The 95 sequences w/these paired mutations in 2022 came from 24 different lineages—setting a minimum number of times these mutation-pairs evolved independently. The true number is much larger than 24 since independent acquisition occurred within lineages (e.g. BA.2). 18/45 Image
Examining the Usher trees, I count 65 instances in 2022 (out of the 95 total) that evolved independently. Metadata show that 15/95 seq are 2nd, 3rd, or nth sequences from the same patient, though at least 3 branches involved different patients. 19/45
For the 2023 sequences, 37 of 66 sequences represent independent acquisitions of an ORF1a:4396/4398 + ORF1b:820-824 mutational pair. Furthermore, 11 sequences have mutations at *both* ORF1a:4396 & 4398. Some sample specimens depicted below. 20/45 Image
One case for which there are 8 sequences has ORF1a:P4389S, S4398L + ORF1b:D824N/N824S, ORF1b:S826A. Another has ORF1a:S4398L, F4399L + ORF1b:P821S, D824N. Combinations like this cannot be coincidental. 21/45 Image
Most sequences with these mutations are highly divergent, almost certainly originating in chronic infections. (The major exception is a BA.5.1.30 Honk Kong cluster of ~140 sequences). But why should this combo occur in chronic infections? 22/45
Chronic infections can result in rapid accumulation of mutations due to the lack of a transmission bottleneck (& other reasons). I think they also increase the chance of unlikely combinations of mutations. But that doesn’t explain why these mutations should be linked. 23/45
Is there any pattern to the types of mutations? ORF1a:S4398L and ORF1a:A4396V are by far the most common in that region. 24/45 Image
Both ORF1a:A4396V and ORF1a:S4398L result in a somewhat larger and more hydrophobic amino acid. 25/45
Image
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ryan Hisner

Ryan Hisner Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LongDesertTrain

Sep 4
There's been some speculation about why, despite persistent immune activation, germinal center activity, & overall elevated Ab levels, LC patients here had very low anti-spike Ab titers. I want to highlight one interesting speculative hypothesis & offer another possibility. 1/10
The ever-fertile mind of @Nucleocapsoid proffers the possibility that exosomes could be responsible for viral spread in some tissue reservoirs. I don't know much about this topic and so don't have much to say at the moment, but I'm trying to l learn. 2/
I'll offer one other possibility: the deep lung environment (or some other tissue reservoir) favors either an extreme RBD-up or extreme RBD-down conformation.

Background: The receptor-binding domain (RBD) of the spike trimer can be up or down. It has to be up to bind ACE2... 3/ Image
Read 10 tweets
Sep 2
A fascinating new preprint w/one very unexpected finding suggests, I believe, that a large proportion of Long Covid may be due to chronic infection in a particular bodily niche, which could be crucial for finding effective LC treatments. It requires some explaining. 🧵 1/33 Image
First, a brief summary of the relevant parts of the preprint. They examined 30 people (from NIH RECOVER cohort) for 6 months after they had Covid, taking detailed blood immunological markers at 3 time points. 20 had Long Covid (PASC), 10 did not (CONV). 2/ biorxiv.org/content/10.110…Image
The PASC group showed signs of persistent, pro-inflammatory immune activation over the 6-month time period that suggested ongoing mucosal immune responses, including elevated levels of mucosa-associated invariant T cells (MAIT). 3/ Image
Read 33 tweets
Jul 30
Wow, BA.3.2 hits its 4th continent with a new sequence from Western Australia.

Reminder: BA.3.2 is a saltation variant resulting from a ~3-year chronic infection. It is very different from and more immune-evasive than all other current variants. 1/4 Image
It was collected July 15, & is most closely related to the recent S African seqs from May & June.

It has an NSP5 mutation known to be beneficial (ORF1a:K3353R) & 2 new NSP12 mutations, which is unusual. Its 9 synonymous mutations indicate it has been circulating somewhere. 2/4 Image
Seems clear now that BA.3.2 is not going away anytime soon. Its overall impact so far has been negligible, but at first BA.2.86's was as well. Once it got S:L455S (becoming JN.1) the dam burst & it set off a new wave in the global North. The question now is.... 3/4 Image
Read 4 tweets
Jul 7
BA.3.2 update: another sequence from the Netherlands, June 18 collection.

It belongs on the same branch as the GBW travel seq (tree gets confused by ORF7-8 deletion). Also, there are 3 artifactual muts in the GBW sequence (as usual), so the branch is shorter than it looks. Image
Bottom line, in my view: BA.3.2 has spread internationally & is likely growing, but very slowly. If nothing changes, its advantage vs circulating lineages, which seem stuck in an evolutionary rut, will likely gradually grow as immunity to dominant variants solidifies... 2/9
So far, this seems like a slow-motion version of what we saw with BA.2.86, which spread internationally & grew very slowly for months. But then it got S:L455S & exploded, wiping out all competitors. Will something similar happen with BA.3.2? I think there's a good chance... 3/9 Image
Read 9 tweets
Jul 2
Quick BA.3.2 update. Another BA.3.2.2 (S:K356T+S:A575S branch) from South Africa via pneumonia surveillance.

This means that 40% of SARS-CoV-2 sequences from SA collected since April 1 (2/5) and 50% collected after May 1 (1/2) are BA.3.2. Its foothold seems strong there. 1/3
2 interesting aspects of the new BA.3.2:
1. ORF1b:R1315C (NSP13_R392C)—This mut is in all Omicron *except* BA.3. So this may well be adaptive.

2. S:Q183H—First known antigenic spike mut seen in BA.3.2, not a major one, but one we've seen before—eg, LB.1/JN.1.9.2.1 2/3 Image
I think the unusually long branches in the BA.3.2 tree indicate 2 things:
1. Slow growth globally—fast growth results in many identical sequences, if surveillance is sufficient

2. Undersampling—BA.3.2 most common in poorer world regions with little sequencing of late. 3/3
Read 5 tweets
Jun 29
BA.3.2 update, Chapter: "I'm Not Quite Dead, Sir"

A new sequence from a traveler to the USA from the Netherlands was uploaded yesterday, with a collection date of June 17. 1/10 Image
This was a BA.3.2.1, the branch with S:H681R + S:P1162R (not S:K356T + S:A575S).

An updated, annotated version of the BA.3.2 Usher tree pictured below.

This sequence has the first new spike mutation since BA.3.2 emerged in November 2024—S:V227L. 2/10 Image
It has an extremely rare NSP5 mutation, ORF1a:T3487S (NSP5:T224S), only in 4 of ~17 million SARS-2 seqs

Intriguingly, 3 of these 4 share something in common w/this BA.3.2.

The first—and most remarkable—is a BA.2 from England that, like BA.3.2, has the ORF7ab-ORF8 deletion. 3/10 Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(