Endonuclease fingerprint indicates a synthetic origin of SARS-CoV-2

A collaborative product by @VBruttel, @tony_vandongen, and myself.

Here's what we found:…

The origin of SARS-CoV-2 is unknown.

Some hypothesized 2 spillover events at the wet market, but methodological flaws make that work inconclusive.…

We need to know the true origin of SARS-CoV-2 to prevent pandemics.

We examined whether SARS-CoV-2 was synthesized in a lab.

We studied a common method for synthesizing CoVs in the lab.

This method was thought to not leave a fingerprint.

We found the fingerprint.

That fingerprint is in the SARS-CoV-2 genome.

Here's how you make a CoV in the lab:

To make a 30kb RNA virus in the lab, you need a 30kb DNA clone

To assemble a 30kb DNA clone, scientists glue together several smaller fragments

A popular method for DNA assembly is ‘golden gate assembly’…
Golden gate assembly requires the DNA sequence have special “cutting” sites (type IIS restriction sites).
Cutting sites creates 3-4 nt “sticky ends”

Sticky ends help you ‘paste’ DNA segments together, ensuring faithful assembly of your 30kb DNA copy of a viral genome.

RNA viruses like CoVs are not under selection specifically for this kind of cutting & pasting.

So, wild viruses tend to have cutting/pasting sites randomly scattered in their genome.

Researchers building viruses in a lab will often add/remove cutting sites…

We collected examples of CoV infectious clones assembled with these type IIS cutting/pasting systems from 2000-2019.

We found a clear pattern in how researchers tended to add/remove cutting/pasting sites.

Researchers tend to turn randomly-spaced restriction maps into regularly-spaced ones (A-B).

Regular spacing comes from desiring fewer fragments (typically 5-8) while keeping the longest fragment lengths low.

Digesting 70 CoVs with 200+ restriction enzymes yields a “wild type distribution”, a null model for how long the longest fragment may be as a function of the number of fragments.

The red box is the ideal range for reverse genetic systems used to make infectious clones

CoVs engineered to be infectious clones will move from having restriction maps falling within the wild type distribution…

To being outliers under the wild-type distribution, falling within the lab-ideal range of fragment number & low longest-fragment-length

Having found this fingerprint, we examine specific cutting/pasting sites in the SARS-CoV-2 genome (BsaI/BsmBI)

BsaI + BsmBI are very popular enzymes for this kind of in vitro assembly

They also have many conserved sites in CoVs. Very useful for making chimeras.

The SARS-CoV-2 BsaI/BsmBI restriction map falls neatly within the ideal range for a reverse genetic system

It is an anomaly (bottom 1%) amongst wild type CoVs.

It is a midpoint amongst engineered CoVs.

Digesting CoVs with only type IIS enzymes that could be used for assembly, SARS-CoV-2 is an even greater outlier

It’s in the bottom 1% max-fragment-length for all restriction enzymes

It’s the single largest outlier (<0.07%) of 1491 type IIS digestions

We then tested the lab-assembly hypothesis

If SARS2 has a synthetic origin via golden gate assembly, several other criteria must be met.

For example: all sticky ends must be unique, non-palindromic, and contain at least one A/T.

SARS2 passed this test (60% chance of this)

The mutations separating SARS-CoV-2 BsaI/BsmBI sites from its close relatives must all be silent mutations.

All 14 mutations in BsaI/BsmBI sites are silent.

84% of mutations in SARS2 & close relatives are silent, so 9% chance all 14 distinct mutations will be silent.

There’s a significantly higher concentration of silent mutations per nucleotide within BsaI/BsmBI recognition sequences than in the rest of the genome

P=0.004 for BANAL52-SARS2

P=9e-8 for RaTG13-SARS2

Such an idealized reverse genetic system is unlikely to evolve by chance from the close relatives of SARS-CoV-2.

There’s a 1% of random RaTG13 mutants having as great or greater z-score

and 0.1% chance for BANAL52.

Testing this from multiple angles, we could not reject the hypothesis that SARS-CoV-2 has a synthetic origin.

Each test also decreased the odds of SARS-CoV-2 having a natural origin

The BsaI/BsmBI fingerprint of SARS-CoV-2 indicates synthetic origin of SARS-CoV-2.

Please read our MS for our careful language & limitations. These are important.

For example, our results are independent of the Furin Cleavage Site.

While the RBD is docked in fragment 5, we shine no light on the origin of the FCS.

Our research does not identify the lab

We hypothesize this restriction map would enable construction of chimeric viruses...

much like the recent controversial work done in Boston (but with a different method for in vitro assembly)…

Our theory of a synthetic origin of SARS-CoV-2 can & should be tested.

Further tests may reject our theory

We welcome these tests.

Our code is available on GitHub and we point to future research that can reject our hypothesis and/or refine our understanding of this issue.

Making chimeric viruses in vitro carries risks

We encourage transparency from researchers studying CoVs in Wuhan.

We strongly encourage global coordination on biosafety.

We encourage open, civil, and compassionate discourse on this important topic

This pre-print was not rushed.

It was reviewed by many colleagues, truly world experts.

We thank them all immensely for their feedback.

For a popular science write-up of our work, see below.

I’m eternally grateful for colleagues like @VBruttel and @tony_vandongen. This has been an incredible project.

Yet, for obvious reasons, this is the saddest paper I’ve ever written.…


Some have correctly pointed out that this type IIS directed-assembly procedure is not exactly "golden gate", but it is very similar

@Kevin_McKernan wrote an article explaining type IIS restriction enzymes that may help others understand these tools!…
Directed assembly of viral genomes with type IIS digestion & subsequent ligation was a very common procedure to make CoVs pre-COVID

See below for one of the foundational articles on efficient reverse genetic systems for coronaviruses…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Alex Washburne

Alex Washburne Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WashburneAlex

Oct 21
Update from day 2:

There were some more great discussions!

We still believe our paper holds up.

The most important tweak is that the exact method used to assemble CoVs isn't best termed "Golden Gate Assembly" but "Type IIS directional assembly"

My terminology caused confusion, and I take responsibility for that - I'm sorry.

Golden Gate Assembly often utilizes type IIS enzymes on both sides of an insert + in opposite orientation.

This helps people assemble DNA & removes the "cutting sites"

2/ Image
That same idea - the very strictly correct idea for the term "golden gate assembly" - led another smart scientist to think that, for our theory to be true, the BsaI/BsmBI recognition sequences must be in opposite orientation.

Read 11 tweets
Oct 21
This is interesting

I'm unable to comment on this thread (having been blocked by Rasmussen) but it does seem like folk may be claiming to @KelseyTuoc that our work is fraud/deception/misconduct.


First, on whether or not the authors know "golden gate assembly" - @tony_vandongen and @VBruttel sure do!

I'm learning, and that's part of the process ❤️

Type IIS directional assembly is a better term for this procedure, and these restriction sites were retained in rCoVs.

Here's one of the seminal papers in the field pre-COVID on "efficient reverse genetic systems" for coronaviruses.

They don't remove the type IIS sites after directional assembly.…

3/ Image
Read 9 tweets
Oct 21
Hey folks,

We're grateful for many thoughtful comments and smart discussions on this topic.

There's one point of discussion I'd like to weigh in on, after which I'll be back at my day-job before drinking from the firehose again this evening.

Multiple comparisons:

The multiple comparisons problem arises when you run many tests, describe the P-values or odds of seeing as-big-or-bigger a discrepancy under the null model, and fail to account for the fact that you ran many tests.…

In our paper, we do evaluate the likelihood of many events under a null model - maximum fragment lengths, ideal no fragments, silent mutations, etc. - so it's fair to discuss multiple comparisons, what corrections we may want to make, and how we justify them.

Read 21 tweets
Oct 20
This is an important point.

Folk claimed to conclude spillover because there were large clades radiating from one common ancestor in the SARS2 tree

That's known to be caused by superspreading & contact tracing. Both proven

It's not proven that spillover would cause polytomies.
Here's the Ebola phylogeny from the 2021 Guinea outbreak.

1 spillover

No superspreading, extensive contact tracing.

No basal polytomy, but later polytomies (likely from contact tracing?)…

1 spillover event

No superspreading, no early contact tracing.

No basal polytomy.
Read 8 tweets
Oct 16
I'm excited to share a new pre-print:

“Statistical challenges for inferring multiple SARS-CoV-2 spillovers with early outbreak phylodynamics”

Our article finds major limitations of the existing literature claiming zoonotic origin of SARS-CoV-2

A recent paper by Pekar et al. claimed to have inferred two spillover events

The authors note 2 large clades at the base of the SARS2 evolutionary tree: Lineage A and Lineage B

They hypothesize each lineage was caused by a separate spillover event.

Pekar et al. model early SARS-CoV-2 outbreak evolution & case-ascertainment.

They estimate a tree with two large clades (‘two basal polytomies’) is unlikely in a model for one-spillover.

However, the probabilities they find are likely artifacts of their model, not reality.

Read 23 tweets
Oct 16
I came across an old journal from a trip to the Amazon

During grad school, I helped buddy study the impact of deforestation on birds & walk vegetation transects through spiky thickets with wasps that stung your eyeballs

One of the most incredible experiences of my life...
My friend Jacob Socolar's work required waking up at 4am to hear the dawn chorus every day in every manner of forest, from varzea to the tierra firma. We saw forests that were pristine, and similar forests that were slashed & burned.
While Jacob listened for birds, I would wander off-trial to see what I could find.

There are so many wonders in the Amazon: a spider mimicking mosquitoes, a cricket mimicking a poison-dart frog, geckos, tarantulas, millipedes, miniscule frogs & caterpillars on bromeliads...
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!