Dog's Breakfast Profile picture
Thoughtcrimes and wrongthink 寻衅滋事罪

Feb 18, 27 tweets

How did the coronavirus get its RBM?

Some time ago SARS-1 and SARS-CoV-2 had a common ancestor. Parts of their genomes are similar, but many functionally important regions are very different. These appear to have evolved by a very unusual - or unnatural -
cut-and-paste process🧵

Molecular evolution has well established mechanisms:

•substitution (frequent) a single base changes to another
•deletion (uncommon) usually only a few, and a multiple of 3 is preferred
•inserts (rare) as above
•recombination (very rare)

There's another type of mutation unusually common between SARS-1 and SARS-CoV-2: where a new sequence has been grafted in place of another. There's no simple explanation. Perhaps they result from 2 or more separate insert/deletion events?

Or they may indicate engineering.

A paper by Tetsuya Akaishi identifies mutations of this type between SARS-1 and SARS-CoV-2. Many of these sites highlighted are immediately recognizable. They include the FCS, and the NTD loops some suggest have HIV homology (I think MERS).

journals.asm.org/doi/10.1128/sp…

In 3d structure all these sites are in disordered loops at the surface of the protein. Here, inserts and deletions are unlikely to result in a non-viable structure. And being at the surface, they interact with host cell receptors.

So they're of interest to malicious engineers.

I previously compared the NTD loops to ZC45, and found they're the result of multiple insertions and deletions. There's only one residue that evolved by regular substitution. Everything else is evolution by "cut and paste".

I've been recently looking at SARS-CoV-2 RBM. For much of it, it's plausible that it evolved by a regular process of substitution. But one region stands out for having extended low homology with SARS-1 and other (reliable) coronaviruses.

Akaishi also identified this site.

In one RNA fragment (blue box) 11 of 12 bases have changed. Extending beyond that 9 consecutive amino acids are different. It seems unlikely this resulted from substitution. An insertion-and-deletion of the same number of bases perhaps?

But from what source?

Bizarrely, when I BLASTed the peptide the top matches of interest, were for WIV virus isolates WIV17 and WIV18.

Why "bizarrely"? Because WIV17/18 are bat adenoviruses, not coronaviruses. Completely unrelated - DNA viruses, not RNA.

There are 1.3 billion combinations of 7 amino acids. There's a 1 in 10 million chance of randomly matching 6 of 7 amino acids. Although not hard to find matches among the trillions of bases in GenBank, it is *uncanny* to find it in an unrelated viral genome - in the same cave.

Although there's one amino acid difference, V and I have such similar properties they're functionally equivalent.

Otherwise, the nucleotide sequence is well conserved between WIV/17 and SARS-CoV-2 with 2 wobble mutations. This is normal evolution - except it's the wrong virus!

What could this be about?

This is my *speculation*...

Human adenoviruses bind a receptor CAR with a spike-like protein (fiber). At the base of this is the penton protein. This typically has a 3aa motif RGD which binds integrins on the cell surface which internalize the virus

WIV had been isolating bat adenoviruses as back as 2010 and found that RGD, and other known integrin binding motifs were absent. They were interested in the possibility that undiscovered binding motifs might exist.

In WIV17/18 the TEVYQAG peptide is part of a protein pIIIa. Multiple copies of pIIIa act as "molecular glue" binding other proteins to form the adenovirus capsid. Being near capsid surface, it's possible that in bat AdVs this protein evolved a similar integrin binding function.

WIV17 and WIV 18 were uploaded to GenBank in May 2017, but the samples were collected in 2012 or 2013, at the same time and district as RaTG13 - was purportedly collected.

RaTG13 has similar sequence to SARS-CoV-2 at this site, though with different wobble codon mutations.

SARS-CoV-2 (unlike SARS-1) spike contains an RGD motif, and two other known integrin binding motifs (LDI/ECD). It's plausible it evolved naturally, it's not unique. But somehow it picks up all the bells and whistles, while RaTG13 doesn't.

RhGB01 is a SARS-1r cov from the UK.

Akaishi shows there are few large inserts and "insert-and-deletion" mutations between RaTG13 and SARS-CoV-2. This leg of the evolutionary journey seems more as expected.

One of these is the much discussed FCS loop. The other is near the 3' end of the RBM.

This is the 3' of the RBM, the last region of high variability. Interestingly the SARS-CoV-2 sequence - but not RaTG13 - has good BLAST matches to bat adenoviruses, thanks to the improbably high concentration of mutations.

The matching adenovirus fragments were sampled in Spain and published in October 2018. Those with the homology are all from Rhinolophus species.

Importantly, these samples are *from the respiratory tract, not fecal*. WIV didn't say how WIV17/18 were sampled.

Further upstream, another poorly conserved region with RaTG13 (although not an insertion-and-deletion). This creates adjacent 4 aa peptides from WIV17, by changing 5 of 8 amino acids.

Although a 4aa peptide isn't very significant, this seems unusual: 2 pair vs 2 of a kind.

So the most divergent regions of RBM between RaTG13, SARS and SARS-CoV-2 have homology to recently discovered bat adenoviruses, that are likely respiratory (unlike sarbecovs, which are enteric in bats).

And SARS-CoV-2 has far greater affinity for respiratory tract than SARS-1.🤔

If we exclude the RBM and its recently acquired adenovirus homology, and ignore the disproportionate silent mutations through the spike, then RaTG13 is >99% aa identical to SARS-CoV-2 in the spike, closest of all covs - real or fake.

Just 10 amino acid differences (+ the FCS).

Postscript: This isn't the first time I unexpectedly found a significant peptide from an adenovirus. The first 7 of 8 amino acids of a protein from a virulent human adenovirus occur in one of the NTD loops of SARS-1.

When I first Blasted the sequence, it was the top match.

But when I Blasted it a year later another sequence was ranked higher (now there are many).

Bangoran virus is an obscure mosquito-borne virus collected in central Africa in 1969. But it was sequenced only in 2021 by a group including Zhengli Shi and Marc Grandadam.

Marc who?

Marc Grandadam was head of a virology lab at Institut Pasteur Laos when they first ventured into the caves where Banal-20-52 was later found.

Why were WIV and Pasteur collaborating on obscure African viruses, at the same time the Banal sequences were announced?

Lead author Dong-sheng Luo gave both Institut Pasteur and WIV affiliations, though he appears to have been in Wuhan when the sequence was submitted, hours before Banal-20-52.

Postscript 2: A recent comprehensive survey by central government lab CAMS found no trace of any SARS-CoV-2 related bat virus anywhere in China.

Even RaTG13 can no longer be found in the mineshaft.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling