Marc Johnson Profile picture
Dec 28 25 tweets 6 min read Read on X
I’m pleased to share that we FINALLY submitted our latest manuscript on SARS-CoV-2 cryptic lineages and what they tell us about the origins of COVID-19.

This was a ton of work.


1/medrxiv.org/cgi/content/sh…
First, standard background. Cryptic lineages are unique, evolutionarily advanced SARS-CoV-2 lineages detected from wastewater.

We are fairly certain that these lineages come from individuals with very long infections (not animals).


2/
We’ve had several previous papers on cryptic lineages, but this new manuscript is about lineages we found in public sequence databases.

We screened the raw data from 135,672 wastewater samples from over 2k sites across 45 countries collected prior to November, 2023.
3/
The first part of the manuscript is just about the methods we used for finding and characterizing these lineages.

I won’t get into the details, but it’s straightforward. We look for things that appear reproducibly but do not match known lineages.
4/ Image
In total we found 18 cryptic lineages. 6 had been described before (3 by us, 3 by others), the rest were novel.

The lineages were all anachronistic (out of time), meaning they were all derived from lineages that had stopped circulating long before the cryptic was detected.
5/ Image
The divergence among the lineages was striking.
6/ Image
One lineage worth particular note was the Ohio (OH-1) lineage, which we detected regularly from two different sewersheds that were 40 miles apart.
7/
In addition to following the sequences submitted to SRA, was also got wastewater samples from both sewersheds and tracked the lineage independently.
8/
The lineage persisted until summer 2023 when it spiked and then abruptly disappeared.

I doubt it ended well, but we will probably never know.
9/
As an aside, the Ohio lineage was a big deal for us. It was the first time we identified a cryptic lineage from a public database, tracked down the sewershed, got them to send us wastewater, and found the exact same lineage.

This was a good sanity check.
10/
An interesting point about the OH-1 lineage is it picked up new mutations over time, and when a new mutation appeared, it appeared in BOTH sewersheds.

The best explanation for this is that the lineage came from a single source that contributed to both sewersheds.
11/ Image
Another interesting point. We identified 5 different insertions in the cryptic lineages. 4 were derived from SARS-CoV-2 sequence, 1 was not.

This is just a reminder that intra- and inter- sequence recombination leading to insertions is a common practice in Coronaviruses.
12/ Image
There was an impressive degree of convergent evolution among the 18 cryptic lineages.
There were 83 mutations that were observed in at least 3 of the lineages and 79 of these changed a protein sequence.
13/ Image
One thing we noticed is that many of the most common convergent mutations changed the sequence to match that of SARS-CoV-2’s closest relative from bats, RaTG-13.
14/
To explore this further, we tabulated all of the amino acid positions that were conserved among closely related Sarbecoviruses but changed in SARS-CoV-2.

There were a total 26 SARS-CoV-2-specific changes.
15/
Amazingly, 12/26 of these positions had reverted to the Sarbeco consensus sequence in at least 1 cryptic lineage, and 7 of them had reverted in at least 3 cryptic lineages.

Two of the reversions appeared in HALF of the cryptic lineages. This is astounding.
16/ Image
Why is this important?
This suggests that there is strong selective pressure on cryptic lineages to revert to the consensus Sarbecovirus sequence.
This is one reason we think cryptic lineages replicate in an environment similar to their bat ancestors (the GI tract).
17/
What does this say about SARS-CoV-2 origins? It suggests that SARS-CoV-2 was replicating in a non-GI (likely respiratory) environment for a considerable period of time (probably years) prior to the start of the COVID pandemic.
18/
This data presented DOES NOT say whether SARS-CoV-2 was zoonotic or from a lab. However, it does add some nuance to either scenario.
19/
If SARS-CoV-2 was indeed a zoonosis, it probably did not come directly from a bat. It is much more likely that it had been circulating in the respiratory tract of an animal, such as one of the animals that we now know were present at the seafood market.
20/
Alternatively, if SARS-CoV-2 came from a lab, it almost certainly was passaged for a long time prior to leaking and was not simply engineered ‘from scratch’.
21/
The combination of mutations acquired by SARS-CoV-2 would not have been tolerated in its normal environment, but they are also not mutations that one would deliberately engineer.
22/
One could argue that the virus was engineered, such as by adding a furin cleave site (FCS), and then passaged. However, the FCS could just as well have been introduced by a random insertion, as this manuscript demonstrates.
23/
If the FCS were added and then passaged, it is hard to explain why the virus did not acquire D614G or enhance the FCS, which both quickly occurred upon spread in humans.
24/ Image
Either way, this data demonstrates that SARS-CoV-2 wasn’t a typical bat Sarbecovirus that jumped straight to humans; the mutations would not have been tolerated in that environment.

SARS-CoV-2 had been practicing for a while.
25/25

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Marc Johnson

Marc Johnson Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SolidEvidence

Dec 29
I finally solved the mystery of why there are so many cryptic lineages in Northern Ohio.

This is a mystery I’ve been working on for the last 18 months.
1/ Image
First, standard background.

Cryptic lineages are unique, evolutionarily advanced SARS-CoV-2 lineages detected from wastewater.

We are fairly certain that these lineages come from individuals with very long infections (not animals).


2/
One of the common changes we see in cryptic lineages is this thing I call the ‘s2m fix’.

I call it that because the mutation ‘fixes’ an RNA structure called s2m which was ‘broken’ in the original SARS-CoV-2 lineage.


3/
Read 23 tweets
Dec 15
What are really the most prevalent SARS-CoV-2 lineages and which are increasing?

This is our latest wastewater analysis.

1/ Image
We downloaded and analyzed seqs from over 3,000 US wastewater samples collected since Oct 16.

We only analyzed the US samples because there weren't any other sites we could find that covered the time period. This represented at least 80M people.
2/
ncbi.nlm.nih.gov/sra/
For the analysis we compared the frequency of every non-consensus change in Spike during the first 3 weeks (10/16-11/5) to the frequency in the second 3 weeks (11/6-11/26).

3/ Image
Read 10 tweets
Dec 7
I'm working on a new strategy to track lineages by making composites of all of the recent wastewater sequences.
1/ Image
We downloaded about 1600 samples from the last month (~1 TB of data) and compared the frequency of mutations in the first 2 weeks versus the second 2 weeks.
2/
KP.3.1.1* is still on top with 50-55% of sequences and dropping slowly.
XEC* is next at 30-35% and rising slowly.
2/
Read 7 tweets
Nov 29
GISAID vs SRA/WW
I thought I would do a little comparison to see how wastewater sequencing data compares with patient sequencing data in evaluating viral trends.
1/
cdc.gov/nwss/index.html
For WW I took all of the samples from our most recent SRA download that were collected in the last month (~500 samples). This wasn’t normalized.

For the patient side I used Cov-Spectrum data (because it's public) from the last month (8,302 sequences).
2/
cov-spectrum.org/explore/World/…
There are about 50k patient samples collected for sequencing each month, but there is always a delay before they are all sequenced and uploaded.

In this regard, the WW data is much faster.
3/
Read 12 tweets
Nov 28
Maryland variant, retrospective analysis.

I decided to have a more careful look back at the evolution of the Maryland cryptic lineage.
1/ Image
Standard explanations and disclaimers.
Cryptic lineage: unique, evolutionary advanced SARS-CoV-2 lineages detected in wastewater from an unknown source.
Cryptics are not from animals, they are long term infections.
2/
Cryptics generally are not contagious and we think they are probably GI infection.

The virus in wastewater is not infectious.

3/
Read 12 tweets
Nov 22
If anyone wants to follow along with the Maryland variant (or doesn't believe my analysis), have a look for yourself.

1/
Go to
Type in SRR31400336 and start alignment.

This is a sample from the Maryland sewershed collected on November 7 of this year.
2/deeperseq.genomium.org
This is the RBD region of the Maryland sewershed, below is a normal sewershed.

It doesn't take a molecular virologist to see that one doesn't look like the other.

3/ Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(