Golden gate assembly requires the DNA sequence have special “cutting” sites (type IIS restriction sites).
Cutting sites creates 3-4 nt “sticky ends”
Sticky ends help you ‘paste’ DNA segments together, ensuring faithful assembly of your 30kb DNA copy of a viral genome.
5/
RNA viruses like CoVs are not under selection specifically for this kind of cutting & pasting.
So, wild viruses tend to have cutting/pasting sites randomly scattered in their genome.
Researchers building viruses in a lab will often add/remove cutting sites…
6/
We collected examples of CoV infectious clones assembled with these type IIS cutting/pasting systems from 2000-2019.
We found a clear pattern in how researchers tended to add/remove cutting/pasting sites.
7/
Researchers tend to turn randomly-spaced restriction maps into regularly-spaced ones (A-B).
Regular spacing comes from desiring fewer fragments (typically 5-8) while keeping the longest fragment lengths low.
8/
Digesting 70 CoVs with 200+ restriction enzymes yields a “wild type distribution”, a null model for how long the longest fragment may be as a function of the number of fragments.
The red box is the ideal range for reverse genetic systems used to make infectious clones
9/
CoVs engineered to be infectious clones will move from having restriction maps falling within the wild type distribution…
To being outliers under the wild-type distribution, falling within the lab-ideal range of fragment number & low longest-fragment-length
10/
Having found this fingerprint, we examine specific cutting/pasting sites in the SARS-CoV-2 genome (BsaI/BsmBI)
BsaI + BsmBI are very popular enzymes for this kind of in vitro assembly
They also have many conserved sites in CoVs. Very useful for making chimeras.
11/
The SARS-CoV-2 BsaI/BsmBI restriction map falls neatly within the ideal range for a reverse genetic system
It is an anomaly (bottom 1%) amongst wild type CoVs.
It is a midpoint amongst engineered CoVs.
12/
Digesting CoVs with only type IIS enzymes that could be used for assembly, SARS-CoV-2 is an even greater outlier
It’s in the bottom 1% max-fragment-length for all restriction enzymes
It’s the single largest outlier (<0.07%) of 1491 type IIS digestions
13/
We then tested the lab-assembly hypothesis
If SARS2 has a synthetic origin via golden gate assembly, several other criteria must be met.
For example: all sticky ends must be unique, non-palindromic, and contain at least one A/T.
SARS2 passed this test (60% chance of this)
14/
The mutations separating SARS-CoV-2 BsaI/BsmBI sites from its close relatives must all be silent mutations.
All 14 mutations in BsaI/BsmBI sites are silent.
84% of mutations in SARS2 & close relatives are silent, so 9% chance all 14 distinct mutations will be silent.
15/
There’s a significantly higher concentration of silent mutations per nucleotide within BsaI/BsmBI recognition sequences than in the rest of the genome
P=0.004 for BANAL52-SARS2
P=9e-8 for RaTG13-SARS2
16/
Such an idealized reverse genetic system is unlikely to evolve by chance from the close relatives of SARS-CoV-2.
There’s a 1% of random RaTG13 mutants having as great or greater z-score
and 0.1% chance for BANAL52.
17/
Testing this from multiple angles, we could not reject the hypothesis that SARS-CoV-2 has a synthetic origin.
Each test also decreased the odds of SARS-CoV-2 having a natural origin
The BsaI/BsmBI fingerprint of SARS-CoV-2 indicates synthetic origin of SARS-CoV-2.
18/
Please read our MS for our careful language & limitations. These are important.
For example, our results are independent of the Furin Cleavage Site.
While the RBD is docked in fragment 5, we shine no light on the origin of the FCS.
19/
Our research does not identify the lab
We hypothesize this restriction map would enable construction of chimeric viruses...
much like the recent controversial work done in Boston (but with a different method for in vitro assembly)
That same idea - the very strictly correct idea for the term "golden gate assembly" - led another smart scientist to think that, for our theory to be true, the BsaI/BsmBI recognition sequences must be in opposite orientation.
I'm unable to comment on this thread (having been blocked by Rasmussen) but it does seem like folk may be claiming to @KelseyTuoc that our work is fraud/deception/misconduct.
The multiple comparisons problem arises when you run many tests, describe the P-values or odds of seeing as-big-or-bigger a discrepancy under the null model, and fail to account for the fact that you ran many tests.
In our paper, we do evaluate the likelihood of many events under a null model - maximum fragment lengths, ideal no fragments, silent mutations, etc. - so it's fair to discuss multiple comparisons, what corrections we may want to make, and how we justify them.
3/
I came across an old journal from a trip to the Amazon
During grad school, I helped buddy study the impact of deforestation on birds & walk vegetation transects through spiky thickets with wasps that stung your eyeballs
One of the most incredible experiences of my life...
My friend Jacob Socolar's work required waking up at 4am to hear the dawn chorus every day in every manner of forest, from varzea to the tierra firma. We saw forests that were pristine, and similar forests that were slashed & burned.
While Jacob listened for birds, I would wander off-trial to see what I could find.
There are so many wonders in the Amazon: a spider mimicking mosquitoes, a cricket mimicking a poison-dart frog, geckos, tarantulas, millipedes, miniscule frogs & caterpillars on bromeliads...