Some further insights into the SARS2 spike sequences found in Pseudomonas aeruginosa datasets, recorded as being sampled in 2019 🧵
2/ Complete SARS2 spike gene sequences were found in contigs generated from Pseudomonas aeruginosa cultures sampled in 2019, by @iximeno
The spike sequence displayed codon optimization and lacked the furin cleavage site
3/ The spike sequences are found in four contigs 👇, inserted into the pcDNA3.1 plasmid h/t @raqueltobes , with a t-PA (tissue plasminogen activator) leader h/t @Daoyu15
4/ I re-examined the following 3 features of the sequences:
i) the C-terminus extension of the spike sequence
ii) the N-terminus t-PA (tissue plasminogen activator) extension
iii) Reasons for variation between the contigs that contain the spike and plasmid sequences
5/ i) The C-terminus of the spike sequences has a peptide extension not found in SARS2
@VBruttel proposed that this is a trimerization domain, but it does not match anything in the database
6/ The C-terminus extension can be traced to a publication from Hong Ling's group, Harbin Medical University
The sequence is termed an 'MTQ' domain and is designed to promote trimerization of overexpressed virus glycoproteins link.springer.com/article/10.100…
7/ The construct described by Ling's paper also uses the pcDNA3.1 expression vector, with a t-PA leader, as well as the MTQ domain, a close match therefore to the contig sequences
8/ The MTQ domain has had limited published use for the overexpression and trimerization of HIV, RSV and SARS2 surface glycoproteins (spike protein is classed as a glycoprotein)
These are all immunogenic as they are exposed to the host immune system, so of medical interest
9/ The MTQ extension starts with a flexible linker GGGSGGS, then followed by a trimerization domain designed de novo using PSIPRED and MARCOIL1.0, incorporating elements that produce coiled coil alpha helices (including heptad repeats)
10/ Trimerization of the spike protein means that it is in its native state and reflects how it is orientated in the SARS2 membrane surface (as a trimer).
This native state conformation can be important for boosting the immunogenicity of subunit vaccines
11/ The MTQ domain has been patented by Harbin Medical University, which means that if used commercially there should be royalty payments
13/ The identification of the MTQ domain verifies that the spike proteins in the 4 contigs are likely intended for subunit vaccines either for humans, or for testing in mouse/bat (as commonly conducted by the WIV, and described in DEFUSE h/t @VBruttel )
14/ Using 'MTQ' as a search term on Addgene () yields no results (C-terminal tags are sometimes provided in plasmid descriptions), so this indicates it is only rarely used (hence no matches on Genbank) and constitutes a useful identifieraddgene.org
15/ There are 14 Chinese spike protein subunit vaccines (2 approved, 12 undergoing trials) listed at COVID19 vaccine tracker (however note that site is not updated after Dec 2022)
16/ Not all of these use the entire spike protein: several use the NTD/RBD instead
Of those that use the spike protein none of those that report which trimerization domain was used report using MTQ
17/ They report using alternative trimerization domains such as:
collagen (SCB-2019)
foldon (SCTV01C)
fibritin (202-CoV9)
18/ Consequently, the spike constructs in the 4 contigs do not appear to have been used in official vaccine trials
This may be because the project was abandoned, never openly subjected to trials, or in a trial after Dec 2022
19/ b) The N-terminus of the spike sequences in the contigs possesses a t-PA leader sequence
The t-PA leader is immunogenic, but is also a secretion signal in mammalian protein expression systems (this was Ling's purpose for adding the t-PA leader)
20/ It is notable that P22 of t-PA has been replaced by asparagine (N) 👇
In the publication by Hong Ling's group, they test two modifications of the t-PA leader:
P22A and P22G
21/ The insertion of small, neutral amino acids was used to enhance cleavage of the t-PA signal sequence from the overexpressed protein after secretion
SignalP was used to predict enhanced cleavage of the tPA signal peptide due to the P22A and P22G substitutions
22/ Oddly, however, using SignalP-6.0, the P22N mutation reduces cleavage efficiency of t-PA, compared to P22A and P22G (probability 0.30 vs 0.92 and 0.81, respectively)
23/ In addition, if they had added the tPA signal sequence 1 amino acid upstream of the current join site, resulting in inclusion of an additional valine, this would result in much better cleavage (probability 0.97)
24/ This indicates that the P22N was suboptimal for cleavage, whether deliberate or not is unclear
Given suboptimal cleavage, these exact constructs are likely to have been unsuccessful as subunit vaccines or in experiments
25/ This emphasizes that designers are perfectly capable of designing suboptimal cleavage sites, with relevance to the supposedly suboptimal FCS of the SARS2 spike protein
26/ Of note, the P22A substitution has been used before Ling's 2011 paper, in 2004 👇 This indicates that the effect of mutating P22 on cleavage was known before Ling's paper, and so it is not a unique marker
27/ iii) Lastly, I looked for differences between the contig sequences
When aligned I found that they are identical, but differ in size. I attribute this to the assembly of few reads may produce contigs of variable sizes (their sequence depth is low h/t @Kevin_McKernan )
28/ This substack by Kevin contains a lot of interesting ideas and background information on the plasmids
29/ A 2023 paper by Dongsheng Zhou and others describes the P.aeruginosa genome sequences, but unfortunately does not provide the sequencing date or location, which could help trace the provenance of the plasmid sequences
30/ Can the observations overall tell us more about the date the spike sequences were generated ?
Not directly, but the identity of the MTQ domain verifies the purpose of the sequences (subunit vaccine), and indicates that there should be documentation if a post-2019 project
31/ (documentation is likely deleted if it is pre-2020)
32/ The observations are consistent with contamination of the P.aeruginosa sequence datasets either during sample / library prep or sequencing
This is because the construct is almost identical to Ling's, which was used for mammalian expression
33/ Lastly, in a striking coincidence the P.aeruginosa contigs match experiments described in DEFUSE
34/ Of interest also, pcDNA3.1 was used extensively by Zhengli Shi for a variety of purposes, including cloning codon optimized SARS1 spike h/t @VBruttel
36/ The pcDNA3.1 expression system is quite widely used; entering the search terms "pcDNA3.1" "spike" "SARS-CoV-2" "S" into Addgene brings back 55 entries from over 7 groups
Differential gene expression analysis of the controversial RaTG13 dataset reveals strong similarity to the RaTG15 dataset, also described as generated from a Rhinolophus affinis 'rectal swab' from the Mojiang Mine
This indicates they have a common, undefined source 🧵
2/ The source of the RaTG13 dataset has been a key puzzle of the C19 Origin debate
RaTG13 was sequenced by the Wuhan Institute of Virology prepandemic in 2017/2018 and remains the closest related CoV backbone to SARS2
3/ While the sample is described as being generated from a Rhinolophus affinis 'fecal swab', numerous investigators have noted this is inconsistent with the low % of bacteria present in the NGS dataset
The Zhang group of Fudan University have identified and validated two A-B intermediate SARS2 genomes from the early pandemic
This provides a key to understanding the origin of COVID19 🧵
2/ In their new paper, the Zhang group sequence 343 new SARS2 genomes from the early pandemic (sampled up to Oct 2020). The genomes were obtained from COVID19 patients in the Shanghai Public Health Center academic.oup.com/ve/advance-art…
3/ Importantly, they identify two SARS2 genomes intermediate between lineage A and lineage B
These were validated using two methods, RT-PCR (Sanger sequencing), and Next Generation Sequencing (NGS). @jbloom_lab verified the sequencing depth on one (high)
Was Baric aware of the work on the human α-ENaC furin cleavage site (FCS) at the University of North Carolina (UNC) ? 🧵
In a striking coincidence, the human α-ENaC FCS is exactly the same as that of SARS2, as first noted by Anand et al in 2020 elifesciences.org/articles/58603
2/ Furin cleavage of human α-ENaC has been studied by M.Jackson Stutts, who is based at the UNC School of Medicine
Ralph Baric is also based at UNC School of Medicine, and also has an interest in lung disease. Was he aware of this work ?
A potential explanation for the alanine (A684) in the SARS2 FCS ?
Alanine mutational scanning is used to systematically mutate residues in functional sites, to determine if they ablate function, so indicating their importance 🧵
2/ In the recent FOIA-ed DEFUSE documents released by @emilyakopp and @USRightToKnow there is a section that proposes to 'ablate' 'human-specific cleavage sites' introduced into SARSr-CoV spike proteins
3/ This would present a method of tweaking the efficiency of the FCS, and identifying key residues. There is a possibility that overly-efficient cleavage might result in excessive shedding of the S1 domain (reducing ACE2 binding rates)
Baric was aware of the lax safety standards at the WIV, in the following draft note:
" In china, might be growin these virus under bsl2. US reseachers will likely freak out"
But put his name to the proposal anyway
Daszak wrote:
"Ralph, Zhengli. If we win this contract, I do not propose that all of this work will necessarily be conducted by Ralph, but I do want to stress the US side of this proposal so that DARPA are comfortable with our team"