Do NOT trust "bat" samples collected after the outbreak-- We caught them prepping the virus library twice.
This could also be oligo assembly errors during HTGS--HTGS synthesis uses multiple overlapping oligos, and when improperly annealed in the PES (Polymerase extension Assembly) process, positive strand oligos with complementary sequences at the 3' end can also create hairpins.
or reads that were part positive strand and part negative strand.
High throughput gene synthesis for political gain is very cheap+very lucrative.
Fake datasets beaten by impurities. Notice that neither RaTG13 nor HKU5 have any such sequences in any abundance. These “bat” “samples” have on average of 1 in 20 viral reads being a positive-negative strand chimera. These are caused by annealing to wrong oligos at the 3’ end.
Impurities generated during gene synthesis.
The S1 is missing from the datasets. assembly of reads revealed that the S homology ends at the beginning of S2'. look like that they don't know the RBD so they left the S1 blank!
Galaxy/Shoville: Assembly revealed that there wasn't any piece of the S1 there. all S homology end at the second half of the S2 post-S2'.
These software uses De-novo assembly and the entire dataset was used for de-novo assembly and alignment using the entire SRR13380249 set to paired end reads mode. the complete Coronavirus sequence can not be de-novo assembled and numerous foldbacks exist in reads, with only two
N SgRNA-like reads that were also folded back on itself. this is not what you see in Coronavirus transcription. it is the result of HTGS when that they don't know what the S1 should look like.
Nor did query with HKU3 or RmYN02 reveal any fragment of S1-S2 or S1. they seems to have hesitated!
Use tBlastn for querying protein against nucleotides. if it is a Sarbecovirus S it should be seen.
Reads that folded back on itself (and was not a hairpinned or transposome-generated read) is only possible with High Throughput Gene Synthesis (HTGS). The concentration near RdRp and N represent the fact that the majority of the reads being mapped there. When two oligos of the
Same genome direction used in HTGS had a 3’ complementary segment, they anneal together and form a read that have folded back on itself. These reads are not randomly spaced yet are not on Coronavirus recombination hotspots (in fact, there were >25x more contigs that folded back
On themselves than there were sgRNA-like contigs. (Note that the N of SARS-CoV-2 is susceptible to mispriming by a 5’-primer due to extended homology between the TRS-L and the TRS-N) also the foldback sites are all non-random sites that had some (+/-) 3’-homology with an average
Of 1 in 20 raw reads folding back on the same sites.) this is very clearly a symptom of assembly errors in the High-Throughput gene synthesis process. Especially when corners are cut and the oligos are made shorter.
The fact that there were far more foldbacks than there were sgRNA-like reads, and the fact that the only sgRNA-like reads happened in the TRS-N that contained extended homology, indicate that the origin of these “Coronavirus” reads as being the remains of impurities within
a dsDNA HTGS product. They weren’t real reads.
In addition, foldbacks like these were not parts of circRNAs as assembly did not reveal contigs that is circular in structure. Assembly errors are the only explanation of this.
They just ordered a batch of overlapping “gene fragment” oligos on places where they are comfortable in, and then threw them in a PCR reaction with some bat cDNA.…
And the fact that this process rely on overlapping 3’ ends, is how exactly it can go wrong in a real reaction, and how you get to all those folded-back (positive and negative strand chimera) reads.
Note that fold-backs are not found in any significant quantities in other legitimate datasets of either SARS-CoV-2 or bat Coronaviruses. The reason is that ILLUMINA second strand cDNA synthesis introduced Uracils in the strand, which quenched subsequent amplification of any
Potential mis-annealing products by having uracil in both strand of such DNA fragments, if the input was in deed honestly prepared RNA/cDNA. (Both TruSeq stranded mRNA and the Respiratory Virus Oligo Panel uses the same Reagents in the amplification step, which will quench
mis-annealed first strand cDNAs that have Uracils incorporated in the process, preventing adapter ligation and stop spot formation). In order to have this kind of annealing to be tagmented, you need the input being pre-made dsDNA that already had these fold-back within the
Sequence, and such DNA must then be prepared with something that don’t mark the second strand with Uracils. In addition, modern cDNA synthesis uses a large excess of random hexamer primers for dscDNA, which have the result being that none of the legitimately prepared library
Like the Cotton HKU5 or other kind of MiSeq library including Some of the earlier sequences of SARS-CoV-2, or the infamous “bat viromes” and PREDICT Saudi Arabia bat datasets, have any significant amount of such fold-back reads. Even RaTG13 lacked such fold-back viral reads.
The larger Cotton HKU5 dataset also lacked significant presence of such fold-back reads, since the HKU5 can be completely assembled using MEGAHIT in a single Contig, and no contigs with fold-back were found. The only way you get to this kind of messy data is for the viral reads
To exist as ssDNA oligos at fairly large concentrations within the sample, and in addition, the sample must undergo regular PCR instead of double-stranded cDNA synthesis before it was inputed into the library preparation step. The Oligo concentration must then exceed the
Primer concentration substantively, (as in a PCA assembly process, but never in a diagnostic cDNA synthesis process where the random hexamer primers exceed cDNA concentration by orders of thousands—this is enforced as a part of all cDNA synthesis/PCR protocols.) the fact that
One out of 20 viral reads (40 total spots) was a fold-back read also exclude the possibility that these may have been fringe mistakes of very low background level—the levels are clearly much too high for it to be of a “normal occurrence”. Even the bat viromes ACE2 only showed
Evidence of alternative Splicing, with no fold-back whatsoever. Nor did the “sequence capture:XGD01” dataset had any kind of fold-back—only splicing isoforms. Nor did the RaTG13 dataset contain this kind of fold-back on either viral or non-viral reads. In order to have this
Abundance, the first amplification steps must be performed with multiple thermal cycles in the Absence of any cDNA synthesis-related primers, and the input must be sufficiently fragmented ssDNA. The only place you find ssDNA at such high concentrations without orders-of-magitude
higher concentration of cDNA synthesis-related primers is within a gene synthesis reaction. there are also contigs with a similar mis-annealed sequences (duplication of partial sequence) in the same direction, which rule out first strand cDNA as a potential origin, even with
protocol-defying low random Hexamer primer concentration, no dUTP marking and extremely high concentration of vRNA (which is not possible with bat samples that are exceedingly dilute, especially with samples that had very few "bat" sequences in general.). (first strand cDNA all
point toward the same direction, which mean that even without dUTP marking and with no added random hexamer primers, no fusions with both parts pointing toward the same direction should have been possible.) this is, however, expected for Oligonucleotide-based assembly
approaches, when the negative strand oligo from another fragment was mis-annealed to the positive strand. (deletion in the center with direct repeat-like similar sequence on both ends.)
This is especially true since the Respiratory Virus Oligo Panel conduct enrichment AFTER the preparation of fully tagmented dsDNA library--in an unenriched sample there simply won't even be such a high vRNA-to-mRNA ratio to allow cDNA to anneal to the same category at frequencies
as high as 1 in 20 vRNA reads having a fold-back. even with first strand cDNA annealing, it will likely anneal to bat mRNA in stead of the same vRNA--the same thing seen in Human SARS-CoV-2 samples and a few early "bat" samples. There is a limited amount of NTP within each cell
and therefore any kind of cDNA synthesis routes will preferentially anneal to cellular RNA, not other viral RNA.…
Even in CircRNA characterization, they have only identified reads of the same strandness throughout a huge panel of RNA-seq datasets with
exceedingly low abundance in the term of… Even fusion of virus to human is exceedingly low in abundance. Not abundance as high as 1/20.
Even the RaTG13 dataset lacked any form of fold-back reads--such high abundance of fold-back reads are simply not possible with genuine vRNA and cDNA. or even altered vRNA and cDNA. or even plasmid in-vitro transcribed RNA and cDNA.
in fact, fold-backs are only observed in virus and rRNA--not any mRNA. as rRNA is a common contaminant and is non-specific, the fact that they don't ever cross into each other and the fact that fold-back reads are far more abundant than any chimeric reads (partial virus or virus
+mRNA fusions, or virus+rRNA fusions--both unobserved) imply that the input fragments for the virus, the mRNA (bat RNA) and the rRNA (likely from some kind of rRNA removal kits) are independent from each other, so that mis-annealing reads don't cross into each other and
preferentially anneal with themselves. several rRNA having fold-back also rules out RdRp involvement as the ribosomal RNP complex is not replicable with the Coronavirus RdRp. one potential source of rRNA fold-back is from commercial rRNA control oligos and primers
, which is often used as internal controls during PCR reactions. another source is commercial rRNA removal kits, used in library prep process and contains antisense oligos of rRNA. as these reads are separated and don't fold into each other (fold-backs are far more abundant
than partial virus, partial rRNA or virus+rRNA, and considering that the library prep enriches ony AFTER cDNA synthesis, and the fact that these fold-back reads all have their component sequences being spaced very close to each other on the viral genome (which wouldn't be the
case if the fold-back were the result of random cDNA annealing, as a mix of non-locus-specific cDNA is random and misannealing should happen in equal spacing across the genome), these anomalies strongly suggest that the samples have been manipulated through the addition of
double-stranded DNA created through the use of High-Throughput Gene Synthesis and subsequent Polymerase Cycling Assembly (PCA) in segments. the rRNA fold-back are likely the result of mis-priming of forward rRNA primer (control) on certain rRNA amplicons/cDNA. As this is only
used during a PCR assay, the ribosomal RNA themselves are likely originated from another (blank) sample that were put through RT-PCR. this is also how they get a blank sample.
Conclusion: Fold-back reads that were virus-specific and only virus-specific, with arms of the fold-back always being close t each other, could neither be created through abberant cDNA synthesis (which will not space the arms close to each other) nor be generated from a legit
sample (which would preferentially stick to host mRNA and rRNA in stead of themselves). nor could they be the result of cDNA fold-back as such process will not generate intact dscDNA for tagmentation. as they are often non-homologous and have lop-sided arms, they are also not
the result of the transposase process since they would always generate base-paired read ends. coupled with an exceedingly high (~1 in 20) abundance of such fold-back reads, they could not be the result of any legitimate library preparation processes and can only be the result of
High-Throuput Gene Synthesis and artifacts associated with Oligo-based gene assembly.
mispriming errors in their finest. these "samples" were synthetic.
they can of course drop in the TRS-L Oligo separately to give some fake sgmRNA reads. too bad that only the N is amenable of this kind of mis-priming.
Also, these reads are far too high an abundance for it to be potential cDNA foldback—no other dataset of Coronavirus have this level of abundance, including HZAU cotton HKu5 and RaTG13.
@threader_app Compile
Foldbacks are detected on both (+) strand and on (-) strand, which indicate that they are not the result of second strand cDNA synthesis self-priming (which sould have left behind Uracils that will stop spot formation anyway) as the overlaps are very weak on these junctions.
Even with a panhandle-like structure, the large mismatches associated with self-priming would have been prevented by random hexamers even at extremely low concentrations.
in fact, most of these folded-back reads were of only very poor overlapping homology, which is only annealable during Polymerase Cycling Assembly, but not in normal cDNA preparation.
several other datasets of this "respiratory virus oligo panel" exists, but the rRNA within these datasets were not folded back on itself, and no forespliced reads were found in these datasets.
Considering that only Read1 was used for the analysis, the real frequency of foldbacks is likely not 1 in 20, but more likely 1 in 10. several of these "bat" datasets had a rate of foldback approaching 1 in 12 in the forward reads alone--this is far too high even with a
transposome-based system-- that won't even form reads that were forespliced on non-canonical positions. (which are absent in any other datasets that used the same library prep.) in fact, the fact that one of the major contigs in the "bat" was forespliced (which require a (+)ssDNA
misannealing) is a solid proof that non-RNA have been used as the source material for the Library prep process.
the lowest number of fold-back reads in these sequences was around 1 in 20 (1/40 in forward reads, 1/30 in reverse reads). the highest amound of fold-back reads are about 1/10 in forward reads. the median of the fold-backs were about 1/10 (1/20 in forward reads).
this is far too concentrated for it to be any kind of cDNA errors. coupled with the presence of non-canonical forespliced contigs being major, this could only mean one thing: the "coronavirus reads" were synthetic DNA.
Another S1 went missing: SRR13380248
All the S proteins in assembles ends at S2'. there were no reads that resemble a S1 or a RBD. they ARE really afraid of screwing things up!
Software: SPADES/MEGAHIT for assembly. tBlastn for alignment to protein sequences.
Another RBD went missing: SRR13380247. Closest to S1-S2: AA689. No part of the RBD was found.
Software: SPADES/MEGAHIT for assembly. tBlastn for alignment to protein sequences.
Another S1 went missing! Accession: SRR13380250
Why all the minor datasets had a missing S1?
A S1-S2 sequence: CASYNSPVARVG. only a sinle spot. there are NO FCS in these datasets.
As these samples were taken too late and were likely censored, they are likely bumped up through contamination (with oligos on the panel), with Shiver and Synthesis (why afraid of displaying the RBM?) or recombination with circulating SASR-CoV-2. The smaller dataset appeared to
Be substantially different from the larger dataset both by the amount of fold-back reads and by the size of the dataset itself. What they did is likely that they altered the RmYN02 dataset, merged reads and synthesized DNA, then used the large dataset as validation and the
Smaller datasets as “further proof”.
As Expected--these datasets were half-truth half-fakes. the biggest, unusually large dataset contained a SL-CoV with overall similarity in the ORF1ab=94%, with an inverted 5'-UTR (not viable!) and a S protein that can't bind ACE2 (they know it!) and have a CASYNSPAVRVG S1-S2.
The total similarity of the sequence to SARS-CoV-2 is only 94% Coverage:94.4% Identity. or an overall similarity of 88.736%. This is far too dissimilar to be relevant to the origin and it lacked all the crucial features (RBM, S1 HIV Inserts and FCS) of SARS-CoV-2.
This sequence did not return any significant similarity to the other "strains" and is only about 96.18% similar to RmYN02. This confirms that the SRA have in deed been tampered with--and that they either left the useless Spike as-is, or have intentionally chosen a fragment from
RmYN02 to get "legitimacy". This is likely used as a backbone for shivering and synthesizing everything else, judged by the lack of S1/RBD and S1-S2 in other datasets. (only S1-S2 found in another dataset was a single isolated read with the same CASYNSPAVRVG sequence.)
What they did: sequencing from capture and censorship--SL-CoV. Alter the S, add in several reads, synthesize and add to blank samples--All the 4 other datasets. there are no evidence for a S1-S2 FCS in any of these datasets.
What they may have actually did: "new samples" (risk of contamination by already circulating SARS-CoV-2) -- SL-CoV -- Censor part of the S, and then add in fake reads -- "Other 4 datasets".
With this kind of half-truth half-fake, the already-prepped backbone library would be ran through the library prep process twice--as the result, all the smaller datasets have much higher level of foldback than this useless and uninteresting "large" dataset.
by isolating the few S1-S2 reads, it also give them wiggle room for more advanced fakes--You just need to add in a few cooked-up S1-S2 reads into the uncovered proportions.
SL-CoV S don’t bind ACE2 so they don’t risk making up a non-viable RBM/RBD since no experiment can be used to prove whether it works or not.
(On the ORF1ab short fragment)
Also, the 5’-UTR seems to be inverted in these datasets. MEGAHIT returned the same kind of inverted 5’-UTR. It is most likely an oversight during Oligo synthesis and resulted in an recurring error in the Polymerase Cycling Assembly process.
Neither the PREDICT serotine bat SISPA (which don’t have much virus in it—the biggest fragment was a small fragment of a Picornavirus 1A protein gene) nor the broken “Rousettus Leschenautii (actually bat tissue and PCR products) have fold-backs—even though they operate in
Non-Uracil-Marked sscDNA synthesis mode.
Another Contig with two separate homology parts that had the same direction. This is the result of PCA errors.…
Here is the assembled data for the datasets. BLAST2 if interested.
All of the assemblies have an inverted 5'-UTR, except for one that were assembled with SPAdes. the SPAdes assembly have a lot of contigs that corresponds to a foldback on the 5'-UTR. this is most likely due to an recurring error in the gene synthesis process. they all had only… Here is the assemblies if anyone is interested in BLAST2. all assembles except for one that used SPAdes have an inverted 5'-UTR. this is likely due to an recurring error during the oligo assembly synthesis process.
the SPAdes assembly have multiple short contigs at the 5'-UTR indicating inverted read, which matches that of an recurring error leaving behind only trace amount of sequences correctly assembled.
~94% to 92% hhomology to SARS-CoV-2.
the homology to SARS-CoV-2 for all sequences were only 92%~94%, with the S1 all missing on in the datsets except for SRR13380251 (SL-CoV with a CASYNSPVARVG S1-S2 and a non-ACE2-using RBM. They have made sure that they will not have to do the ACE2 binding experiment by using a
SL-CoV S.)
@threader_app Compile

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Ersa Flavinkins

Ersa Flavinkins Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @flavinkins

12 Jan
Here is why you shouldn't trust anything that was sampled after the outbreak:… all samples were collected at June 2020. they mentioned an "oligo array" which is how they get to all virus, no bacteria. they claim "pangolins" but there were only bat.
What they actually did: the first part of SRR13380247 were virus at the 1st read and nothing at the 2nd read, despite the quality score say otherwise. what this tells you is that they have used a blank bat sample and artificially added in reads of fake viruses.
all samples lacked non-coronavirus reads. one have traces of human Flu virus and other have trace (6Kbp) "picornavirales". both were likely probes that have fell off. the samples were very late, so the most likely route: High throughput gene synthesis->mix into blank sample
Read 64 tweets
7 Jan
WH04 is a case that is in fact outside of the market--Patient zero does not have a connection to the "wet market", didn't contact a bat and have no connection to the "wildlife trade" whatsoever. However he is a cleaning personnel and could easily have been exposed to improperly Image
disposed lab waste. all other sets diversify from that index case. after entering the market, T8787C, C28144T. then outside the market, random mutations. Such finding solidly rult out the "wet market" as the origin of the virus and destroys a big claim of "natural origin".
Notice that cleaning workers in China are highly mobile, and are often employed transiently be all manner of bureaus, companies and agencies. it should be pivotal to see the employer of this particular patient before this first case.
Read 6 tweets
6 Jan
@Harvard2H Technically mouse and NHP… are fully protected by a single dose of inactivated SARS-CoV-2 vaccine (but not SARS-CoV).… so still no reservoir as there won’t be any reinfections.
Notice that antisera and mAb protect mouse against CoV-2 but not SARS1.……
“Incomplete protection” and “pulmonary immunopathology” for SARS-CoV and “no replication” for SARS-CoV-2.
CoV-2 is susceptible to vaccination in mice.…

But no complete protection was observed for SARS-CoV in suspectible animals (MA15/Mice or Rhesus).
Read 11 tweets
5 Jan……
They have pangolin Cells in a dish! These cells (primary fibroblasts) are collected at 2020-03-20 and are susceptible to contamination by lab viruses and environmental viruses. (Although no Coronaviruses have been found in either
datasets). These cells will no doubt be used for the fabrication of credible-looking samples for all manner of fraud viruses—using infectious clone rescue within the cells. They were originally provided by the Beijing University. Lentivirus presence in both sample suggest
Sample cell culture already contaminated by environmental viruses. Late collection date also make the dataset useless for Coronavirus-related studies.
Read 10 tweets
5 Jan
@kenpcg @K_G_Andersen @angie_rasmussen @NYMag @nicholsonbaker8…
Reverse genetic system claim of KG anderson: Debunked. CoV-2 is uniquely suited for cloning using the same RGS of RALPH baric.
@kenpcg @K_G_Andersen @angie_rasmussen @NYMag @nicholsonbaker8…
O-linked glycan claim: debunked. Full-length S1 containing a complete QTQTNSPRRARSVASQ sequon failed to reveal any O-linked glycosylation at T673,S679 or S685.
@kenpcg @K_G_Andersen @angie_rasmussen @NYMag @nicholsonbaker8…

pangolin claim: debunked. There are no circulating viruses within wild pangolins sampled anywhere in the world.…
Read 18 tweets
4 Jan…
Regarding on the BsaI/BsmBI system used for Coronavirus cloning..... CoV2 is the ONLY strain that can be cloned using BsmBI/Esp3I and BsaI. a Esp3I that is cut out is found just as expected: on the ORF1ab-S junction.
Where did the second SacI site in this lineage go?
The “most uniform” cloning scheme only work for CoV-2. CoV-2 is the only strain where there are no Esp3I or BsaI sites that were spaced too close together.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!