Steve Massey Profile picture
Aug 18 18 tweets 10 min read
@humblesci Daoyu @ydeigin @quay_dr and myself have published a study that shows the exclusion of 20 A/B intermediate SARS2 genomes from the analysis of Pekar et al. is unwarranted 🧵
zenodo.org/record/7005332…
2/ The existence of such intermediate genomes in humans is incompatible with the two spillover model for the origin of C19

To recap: lineage A has T8782/C28144 (T/C) while lineage B has C8782/T28144 (C/T)

An A/B intermediate will have C/C or T/T
3/ Firstly, as previously pointed out, Pekar et al's exclusion criterion of 'low read depth' is inconsistent with data from GISAID showing high read depth for the majority of the datasets
4/ Many of the intermediate genomes are sequenced using ONT MinION/GridION/PromethION, a sequencing depth of > 60X is recommended for this platform

nature.com/articles/s4146…

Only 1 genome falls below this criterion (Table 1, in yellow) ImageImage
5/ Curiously, 'contamination' is used as an exclusion criterion. However, nowhere is any evidence presented of contamination. One way to do this is via haplogroup analysis of human mito genomes, to show more than one haplogroup present, which they fail to do Image
6/ 'Personal communications' are used to exclude 11 C/C and 3 T/T genomes. An 'L.Chen' is credited for the information that 11 intermediate C/C genomes from Sichuan and Wuhan are sequencing errors
7/ However, the identity of L.Chen is unclear. In addition, the C/C genomes (actually 12 not 11) from Sichuan and Wuhan were sequenced in different sequencing centers, so likely sequenced by two different people ImageImage
8/ Given L.Chen is explicitly linked by the authors to Sichuan 👇, then the Wuhan genome was sequenced by an unidentified person we term 'person X'. It is concerning that a C/C intermediate genome was excluded by a personal communication with an unidentified person Image
9/ One of the Sichuan C/C genomes (EPI_ISL_451320) excluded by Pekar et al. is actually used by NextStrain to root their phylogenetic tree (as an A/B root). This genome has a 1335X sequencing depth. Clearly @nextstrain do not concur that it is erroneous Image
10/ Pekar et al. exclude genomes from Singapore (EPI_ISL_462306) and South Korea (EPI_ISL_413017) that had raw data available, on the basis of low sequencing depth at positions 8782/28144 and 28144, respectively

However, we show that the Singapore genome is clearly a T/T genome
11/ We do this by mapping the raw reads to the Hu-1 reference genome (which is C/T, lineage B)

Position 8782 has 12/12 T, while 28144 has 6/6 T
h/t @humblesci

This is clearly a T/T genome therefore Image
12/ We also identify an additional intermediate T/T genome from Guangzhou (GZMU0025.capture, SRR13616010), that has the following SNVs when compared to Hu-1

The T8782 genotype is clear

h/t @humblesci Daoyu Image
13/ Remarkably, two C/C intermediate genomes from Beijing (2500X and 1850X sequencing depth) were excluded because 'no underlying data was available'.

This is hard to understand, and was selectively applied (it was not applied to the 787 remaining genomes used in their analysis) Image
14/ Puzzlingly, one T/T intermediate genome EPI_ISL_493182 was discarded even though it conformed to their (contentious) 10X read depth cutoff. Position 8782 is a consensus T nucleotide, with 19/29 reads T Image
15/ Pekar et al fail to explain how repeated sequencing errors can occur in the same positions 8782 and 28144 in multiple genomes

If sequencing depth were a problem in causing miscalls, there should be a significant number of unique SNVs in these genomes, which is not observed
16/ 'Convergence' was used to exclude 7 intermediates that possessed A, B or A/B specific SNVs. However, 5 of these only possessed 1 A or B specific SNV. These could be true intermediates that picked up a A or B specific SNV by convergence

This caveat was not discussed Image
17/ To conclude, the exclusion of most of the 20 intermediate genomes from the analysis of Pekar et al. is untenable, and represents an unsurmountable problem for the conclusion of two zoonoses leading to the establishment of lineages A and B

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Steve Massey

Steve Massey Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @stevenemassey

Aug 3
GISAID sequencing information for the C/C and T/T intermediate genomes from Pekar et al is inconsistent with their exclusion criteria

A short 🧵
science.org/doi/10.1126/sc…
2/ C/C and T/T genomes are intermediate between lineage A (T8782/C28144) and lineage B (C8782/T28144) genomes

Their early existence in humans would undermine the two zoonosis hypothesis of Pekar et al
3/ 20 intermediate genomes were identified by Pekar et al but excluded for a number of reasons

These included ' Underlying data was not available ' and 'Incorrect base calls, often due to low sequencing depth'
Read 10 tweets
Jul 23
A short 🧵 continuing the discussion on apparent early SARS2 contaminants in the pangolin GD/P2S dataset, that was sequenced at Vision Medicals at about the same period as the first SARS2 genome Wuhan 2019-nCoV Dec 2019/Jan 2020
2/ bbmap identifies 93 reads in the GD/P2S dataset that perfectly match the SARS2 genome

I mapped the SARS2 matches to the GD/P2S genome - 62/93 matched with 100 % identity (note the genome is incomplete)
3/ This could be due to misincorporation of SARS2 reads during genome assembly, and casts doubt on the veracity of the GD/P2S genome sequence

An alternative scenario is that short regions of 100 % identity exist between the SARS2 and GD/P2S genomes
Read 8 tweets
Jul 14
@BillyBostickson Well spotted ! Yes this could be significant. The results were first reported in a previous pub 👇

This dataset was deposited 15 Feb 2020 (SRR11093271). Lam et al. was posted 13 Feb 2020, they did Ion Torrent sequencing on pango samples from 2017-2018 /1
arxiv.org/abs/2108.08163
@BillyBostickson Sequencing was done in Jan 2020 or before, so very early on. These may represent an early SARS2 strain therefore

Significant SARS2 contamination was also found in the GD/P2S dataset (SRR11093265) 👇 This was from a pangolin scale sample /2 Image
@BillyBostickson IMO it was enriched which would explain > 200 SARS2 matches. The reads were from two machines, sequenced at Vision medicals, Guangdong. Again, they must have been sequenced Jan 2020 or before /3
Read 8 tweets
Apr 16
More anomalies in bat 'rectal swab' datasets, this time Cambodian bat alphacoronavirus GCCDC1 🧵

In the Jan 2022 Viruses publication by Linfa Wang and Danielle Anderson, bacterial levels in one sample are inconsistent with a 'rectal swab' 👇

mdpi.com/1999-4915/14/2…
2/ The study presents complete GCCDC1 genomes from samples collected in Steung Treng, Cambodia Dec 2010 from several bat species, including Rhinolophus shameli
3/ The samples were described as 9 rectal swabs and 1 oral swab. The samples were placed in trizol (a lytic agent), and then centrifuged. This should lyse bacterial as well as eukaryotic cells
Read 16 tweets
Mar 10
The microbes present in the SARS2-contaminated Antarctic soil datasets discovered by Csabai/Solymosi can help to better understand the nature of the SARS2 contamination 🧵
researchsquare.com/article/rs-133…
2/ Csabai / Solymosi discovered evidence of green monkey, Chinese hamster and human mitochondrial sequences in the SARS2-contaminated Antarctic soil datasets. The question is whether these came from cell lines, and/or patient samples for the human mito sequences ImageImageImage
3/ I ran Metaxa2 to identify bacterial small subunit (SSU) rRNAs present in the Antarctic soil datasets 👇

There were significant number of Mycoplasma SSU rRNA sequences present. These are not expected in Antarctic soil, but are common contaminants of cell cultures Image
Read 25 tweets
Feb 13
Mitochondrial rRNA analysis of the SARS2 contaminated Antarctic datasets identified by Csabai and Solymosi 👇

I ran Metaxa2 on the SARS2 contaminated datasets, then assembled the mit SSU rRNA reads using megahit to produce contigs

researchsquare.com/article/rs-133…
2/ Green Monkey (Chlorocebus sabaeus) rRNAs are present in all three datasets, consistent with Csabai and Solymosi. The authors identified Green Monkey as the most closely related mitochondrial (mit) genome that the reads map to, out of the genomes present in the database
3/ The Metaxa2 analysis provides a precise species identification. It also confirms human mit rRNA was present 👇
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(