Ryan Hisner Profile picture
Nov 24, 2023 25 tweets 9 min read Read on X
I spend a lot of time analyzing and documenting outlier SARS-CoV-2 sequences. Recently, I’ve noticed a fascinating pattern: the repeated appearance of paired mutations in two narrow NSP12 regions >2400 nucleotides distant from each other. 1/45
NSP12 is the RNA-dependent RNA polymerase (RdRp), part of the SARS-CoV-2 replication complex, which also includes NSP7-10 and NSP13-14. I discussed some basics of the RdRp & coronavirus genome replication previously. 2/45
The first two-thirds of the SARS-CoV-2 genome consists of ORF1a & ORF1b. When the full genome enters a ribosome, the cell protein-making machine, only ORF1a is translated ~75% of the time. The ribosome hits a stop codon & cuts loose. No ORF1b. 3/45 Image
But the other ~25% of the time, at the end of ORF1a, the ribosome gets tangled in a complex RNA structure called a pseudoknot. They say the ribosome “slips,” but to me it seems more like it gets stuck & momentarily stumbles backward. 4/45 Image
The RdRp copies the same nucleotide twice, shifting its reading frame. The stop codon is bypassed & ORF1b, which includes NSP12-16, is translated. Oddly, the first 1% or so of NSP12 is in ORF1a, with the rest in ORF1b. 5/45
Image
Image
I mention this because the first of the two NSP12 regions in which these paired mutations occur is in ORF1a. The region is ORF1a:4396-4399 (NSP12_4-7), with nearly all being at either ORF1:4396 or ORF1a:4398. 6/45 Image
The other region is ORF1b:820-824 (NSP12_829-833). These two regions are not particularly close to each other in the NSP12/RdRp complex as depicted in structural studies. But there has to be some sort of connection between them. 7/45 Image
Mutations in both regions are uncommon. In fact, using the legendary @ChaoranChen_‘s CovSpectrum, you can calculate how often we’d expect mutations in these regions to occur together if they had no connection to one another and occurred randomly. 8/45 Image
For simplicity, I’m restricting the analysis to ORF1a:4396 and ORF1a:4398, where the vast majority of these mutations are, though some mutations at 4394, 4395, & 4399 have been also paired with ORF1b:820-824 mutations. 9/45
I’ve analyzed 2020, 2021, 2022, & 2023 separately. I spent a long, long time excluding bad sequences from these lists, which is why the queries are complicated. Below are results & queries used for each year for sequences with ORF1a:4396 or ORF1a:4398 mutations. 10/45


Image
Image
Image
Image
And here are the results & queries used for each year for sequences with a mutation somewhere in ORF1b:820-824. I also filtered these results to exclude any sequence with an SNP clusters score of >99. 11/45


Image
Image
Image
Image
To accurately calculate the number of sequences one would expect to have mutations at both ORF1a:4396/4398 & ORF1b:820-824 (assuming that they occur randomly), you need to know the total number of sequences that have coverage of NSP12. 12/45
To exclude spike-only sequences & those lacking coverage in the NSP12 region, I searched for ORF1b:P314L for each year. This mutation has been universal since mid-2020 (with a handful of extremely interesting exceptions). 13/45 Image
Now it’s a simple matter to calculate (perhaps naively) how many sequences we would expect to have mutations in both regions each year. (% seq w/ORF1a:4396/4398)(% w/ORF1b:820-824)(total seq) = expected number of sequences = not very many.
14/45 Image
How does the actual number compare? Nothing unusual in 2020 (0 actual vs 1.54 expected) or 2021 (43.2 vs 32), but there seem to be too many sequences with both in 2022 (6.63 vs 95) & 2023 (1.45 vs 66)—i.e. the Omicron era. So this is an Omicron specialty. 15/45 Image
Now you could find thousands of pairs of mutations that are “overrepresented” using the same logic described above. Every decent-sized lineage would have several. It’s really the number of times the mutations were independently acquired that we’re interested in. 16/45
But it turns out that the majority of sequences with ORF1a:4396/4398 + ORF1b:820-824 mutations acquired them independently. I’ve looked at all these sequences individually, but you can tell quickly by looking at the designated lineages of these sequences. 17/45
The 95 sequences w/these paired mutations in 2022 came from 24 different lineages—setting a minimum number of times these mutation-pairs evolved independently. The true number is much larger than 24 since independent acquisition occurred within lineages (e.g. BA.2). 18/45 Image
Examining the Usher trees, I count 65 instances in 2022 (out of the 95 total) that evolved independently. Metadata show that 15/95 seq are 2nd, 3rd, or nth sequences from the same patient, though at least 3 branches involved different patients. 19/45
For the 2023 sequences, 37 of 66 sequences represent independent acquisitions of an ORF1a:4396/4398 + ORF1b:820-824 mutational pair. Furthermore, 11 sequences have mutations at *both* ORF1a:4396 & 4398. Some sample specimens depicted below. 20/45 Image
One case for which there are 8 sequences has ORF1a:P4389S, S4398L + ORF1b:D824N/N824S, ORF1b:S826A. Another has ORF1a:S4398L, F4399L + ORF1b:P821S, D824N. Combinations like this cannot be coincidental. 21/45 Image
Most sequences with these mutations are highly divergent, almost certainly originating in chronic infections. (The major exception is a BA.5.1.30 Honk Kong cluster of ~140 sequences). But why should this combo occur in chronic infections? 22/45
Chronic infections can result in rapid accumulation of mutations due to the lack of a transmission bottleneck (& other reasons). I think they also increase the chance of unlikely combinations of mutations. But that doesn’t explain why these mutations should be linked. 23/45
Is there any pattern to the types of mutations? ORF1a:S4398L and ORF1a:A4396V are by far the most common in that region. 24/45 Image
Both ORF1a:A4396V and ORF1a:S4398L result in a somewhat larger and more hydrophobic amino acid. 25/45
Image
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ryan Hisner

Ryan Hisner Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LongDesertTrain

Jul 7
BA.3.2 update: another sequence from the Netherlands, June 18 collection.

It belongs on the same branch as the GBW travel seq (tree gets confused by ORF7-8 deletion). Also, there are 3 artifactual muts in the GBW sequence (as usual), so the branch is shorter than it looks. Image
Bottom line, in my view: BA.3.2 has spread internationally & is likely growing, but very slowly. If nothing changes, its advantage vs circulating lineages, which seem stuck in an evolutionary rut, will likely gradually grow as immunity to dominant variants solidifies... 2/9
So far, this seems like a slow-motion version of what we saw with BA.2.86, which spread internationally & grew very slowly for months. But then it got S:L455S & exploded, wiping out all competitors. Will something similar happen with BA.3.2? I think there's a good chance... 3/9 Image
Read 9 tweets
Jul 2
Quick BA.3.2 update. Another BA.3.2.2 (S:K356T+S:A575S branch) from South Africa via pneumonia surveillance.

This means that 40% of SARS-CoV-2 sequences from SA collected since April 1 (2/5) and 50% collected after May 1 (1/2) are BA.3.2. Its foothold seems strong there. 1/3
2 interesting aspects of the new BA.3.2:
1. ORF1b:R1315C (NSP13_R392C)—This mut is in all Omicron *except* BA.3. So this may well be adaptive.

2. S:Q183H—First known antigenic spike mut seen in BA.3.2, not a major one, but one we've seen before—eg, LB.1/JN.1.9.2.1 2/3 Image
I think the unusually long branches in the BA.3.2 tree indicate 2 things:
1. Slow growth globally—fast growth results in many identical sequences, if surveillance is sufficient

2. Undersampling—BA.3.2 most common in poorer world regions with little sequencing of late. 3/3
Read 5 tweets
Jun 29
BA.3.2 update, Chapter: "I'm Not Quite Dead, Sir"

A new sequence from a traveler to the USA from the Netherlands was uploaded yesterday, with a collection date of June 17. 1/10 Image
This was a BA.3.2.1, the branch with S:H681R + S:P1162R (not S:K356T + S:A575S).

An updated, annotated version of the BA.3.2 Usher tree pictured below.

This sequence has the first new spike mutation since BA.3.2 emerged in November 2024—S:V227L. 2/10 Image
It has an extremely rare NSP5 mutation, ORF1a:T3487S (NSP5:T224S), only in 4 of ~17 million SARS-2 seqs

Intriguingly, 3 of these 4 share something in common w/this BA.3.2.

The first—and most remarkable—is a BA.2 from England that, like BA.3.2, has the ORF7ab-ORF8 deletion. 3/10 Image
Read 11 tweets
Jun 27
@yaem98684142 @TBM4_JP This analysis is extremely flawed.

There is nothing abnormal about BA.2.86 appearing in multiple countries shortly after discovery. This has been the norm lately w/reduced surveillance. 1/
@yaem98684142 @TBM4_JP The mutational spectrum analysis is poorly done. It cites a single study looking at the mutational spectrum in *three* immunocompromised individuals. Needless to say, this sample size is WAY too small. 3/
@yaem98684142 @TBM4_JP Furthermore, the IC people examined did not give rise to highly divergent variants with a large number of spike mutations. They appear to have accumulated a very modest number of mutations, with few substitutions in spike. The sequences themselves are apparently not published. 4/
Read 7 tweets
Jun 19
Interesting recombinant showed up today from Texas. It's a mixture of B.1.595, BA.1, and some flavor of JN.1. Most of the genome is from B.1.595. The ancestry of this one is clear: it directly descends from a B.1.595 sequence collected in January 2023, also in Texas. 1/11 Image
When the B.1.595 was collected this infection was >1 yr old, w/no sign of Omicron. BA.1 ceased circulating ~1 year prior.
Now a BA.1 spike appears w/just 5 changes from baseline BA.1, none in the RBD—S12F, T76I, Q271K, R765H, S939F.

This is a zombie BA.1 spike. 2/ Image
There are only a few signs of JN.1, & they're scattered. In ORF1a, we see JN.1's V3593F, P3395H, & R3821K, but the NSP6 deletion btwn these—universal in Omicron—is absent. In
M has JN.1's D3H + T30A & E19Q (in JN.1 & BA.1), yet A63T—also in both BA.1 & JN.1 is absent. 3/11 Image
Read 11 tweets
May 31
An awesome preprint on the novel, unsung SARS-CoV-2 N* protein came out recently, authored by @corcoran_lab & Rory Mulloy. I’ve previously written on N*’s demise in XEC, the top variant in late 2024/early 2025. But…
1/34
…this preprint, along with another great study by the @DavidLVBauer, @theosanderson, @PeacockFlu & others prompted me to take a closer look...
2/34biorxiv.org/content/10.110…
...and for reasons I’ll describe below, I now believe rumors of N*’s death are exaggerated.

First, XEC is in terminal decline, replaced by variants with full N* expression, so N* is back in fashion.
3/34
journals.plos.org/plosbiology/ar…
Read 35 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(