A new animal coronavirus, possibly from bamboo rat, is present in abundance in sample Q61 from the Huanan Seafood Market
This find, however, stands in stark contrast to the single SARS2 read present in the same sample, which has a high proportion of raccoon dog sequences 🧵
2/ Sample Q61 was generated by Liu et al in their survey of the Huanan Seafood Market
It has generated discussion for having a high level of raccoon dog reads, which has been postulated by some workers as an intermediate host for SARS2 nature.com/articles/s4158…
3/ Previously, I had identified reads related to human OC43 /HKU1 coronaviruses using the coronascan procedure
4/ Subsequently, several other investigators and myself determined that these reads were more closely related to rodent coronaviruses using Blast (this allowed matching to partial CoV sequences, and additional whole genome sequences)
10/ While it is difficult to be certain as to the host animal of the new CoV detected in the Q61 sample, the type of correlational analysis by @jbloom_lab across all NGS samples might reveal if the novel CoV is correlated with bamboo rat reads biorxiv.org/content/10.110…
11/ Liu et al also did a qualitative spatial correlation analysis which could likewise be informative:
12/ Previously, I had noted that the coronascan procedure could be improved by adding more complete and partial coronavirus genomes
However, partial genomes would confuse the ranking using # reads and % coverage. Some form of normalization would be required therefore
13/ Ranking whole coronavirus genomes on the basis of % coverage is useful as % coverage is related to phylogenetic distance (as well as viral titer in the sample)
In addition, masking of low-complexity regions using bbmask.sh should improve accuracy h/t henjin
14/ When I mapped the Q61 reads to the GX/GX19-89/2019 genome only, there was a substantially higher level of mapping: 32.9 % coverage
This is due to absence of other genomes, to which reads might cross-map
Cross-mapping dynamics need to be better understood and controlled for
15/ I took the 331 reads that mapped to GX/GX19-89/2019 only, and tried to assemble them using Megahit
Unfortunately, Megahit did not generate any assembled contigs
(contigs can give precise information regarding SNVs, when compared to the GX/GX19-89/2019 genome)
16/ I then displayed the mapped reads on the GX/GX19-89/2019 genome using Integrative Genome Viewer (IGV) - they are fairly evenly mapped across the genome, although somewhat denser in the second half of the genome
17/ henjin (from Discord metagenomics chat) did a more comprehensive analysis than me, with a larger # of complete CoV genomes and did a distance based clustering tree using ggtree
henjin found the 3 bamboo rat CoVs group between OC43 and HKU1 CoVs
18/ "In the plot ... the x-axis shows the distance to the bamboo rat virus which had the most aligned reads from Env_0576, and the y-axis shows the distance to the HKU1 reference genome"
19/ "Each point is connected with a line to its two neighbors. "Mouse coronavirus PREDICT/PDF-2560" is connected to two of the bamboo rat coronaviruses even though it has over 7,000 nucleotide changes from "Bamboo rat coronavirus isolate GX/GX19-89/2019""
h/t henjin
20/ Given that some of the novel CoV reads are divergent from the GX/GX19-89/2019 genome, possessing SNVs, this means that there are likely additional reads present that do not map to GX/GX19-89/2019 (due to the sequence divergence)
21/ Consequently, the number of reads derived from the new coronavirus is likely higher than 331, strongly outnumbering the single SARS2 read
22/ The value of this novel CoV is that it shows that the # of reads that can be expected from an animal-associated CoV in the market NGS datasets is in the order of 100s
Also, that a market animal CoV has not degraded prior to sampling, being present in significant levels
23/ In the light of these considerations, the presence of only a single SARS2 read, in a sample with a large proportion of raccoon dog reads, appears increasingly inconsistent with the presence of infected raccoon dogs
On Blast searching individual reads from Q61 that matched the human OC43 and HKU1 genomes, the closest matches are to bamboo rat and rodent CoV sequences 🧵
2/ Previously I had found reads that mapped to human OC43/HKU1 coronaviruses, interpreting this to mean that human coronaviruses were present in the raccoon dog sample Q61
What is the true trace level of SARS2 in the raccoon dog sample Q61 from the Huanan Seafood Market ?
Here, in a refined analysis, I show that human common cold coronaviruses are present in quantities greater than SARS2 in Q61 🧵
2/ Q61 is a key sample described by Liu et al in their survey of the Huanan Seafood Market, as it contained large quantities of raccoon dog sequences, leading some to claim that it was evidence that raccoon dogs were the source of the SARS2 pandemic nature.com/articles/s4158…
3/ Previously, using coronascan, I found 137 reads in Q61 that matched SARS2, but @humblesci and @emmecola only found 1
I also found human coronaviruses OC43, HKU1, 229E and an alphacoronavirus
In their survey of the Huanan Seafood Market, Liu et al conclude that there is no evidence for a zoonotic transmission of SARS2 at the market
Out of curiosity, I did a deep dive into sample Q61 (Env_0576), which they report as having a high proportion of raccoon dog reads 🧵
2/ I ran a mitoscan analysis, a procedure developed with @humblesci , that was first described by Csabai and Solymosi in their Antarctic soil preprint that identified early SARS2 sequences associated with a variety of potential mammalian cell lines 👇 researchsquare.com/article/rs-133…
3/ Mitoscan systematically maps all reads in an NGS dataset against all mito(chondrial) genomes in the Genbank db
It uses bowtie2 default settings, which allows for a limited number of mito polymorphisms/sequencing errors when mapping github.com/semassey/Scann…
Our new paper on intermediate lineage A-B SARS2 genomes is out !
Within, we criticize the exclusion of 20 potential A-B intermediate genomes from Pekar et al (2022), finding that the majority were improperly excluded 🧵 mdpi.com/2036-7481/14/1…
2/ Lin(eage) A and lin B are two earliest established lineages from Wuhan. A likely arose first, with B arising from A via two SNVs at positions lineage defining positions 8782 and 28144
3/ The presence of SARS2 genomes intermediate between lin A and lin B from humans would invalidate Pekar et al’s hypothesis that SARS2 spilled over twice from an unknown animal into humans, at the Huanan seafood market
9/ This saga is a case study in the perils of making grandiose claims without having completed the analysis, and of the hubris to embark into a new subject area without specialists (metagenomic) to scrutinize and suggest robust analyses
So, Flo Debarre et al’s raccoon dog analysis that has caused such a media frenzy has been released and what does it show ?
Not much 🧵
2/ Essentially, it confirms Gao et al’s preprint analysis that there was nuc acid from animals in addition to humans in the samples 👇 (no surprise there)
It adds some species specific info
3/ The analysis is crude, and taxonomic attribution method naïve
They rely on seq assembly, which miss a lot of info
They use numbers of assembled contigs as a metric for quantity of species specific material
This is semi-quantitative at best (due to the vagaries of assembly)