1: We have just posted a study suggesting there may be no real #SARSCoV2 genomes that are transitional between lineages A and B. Arcane, right?

But stick with me - this stuff is *absolutely* crucial to figuring out how the pandemic got started.

virological.org/t/evidence-aga…
2: Honoured to be working on this project with an *amazing* team: @jepekar, @EdythParker, Jennifer Havens, @suchard_group, Kristian Andersen, @niemasd, @arambaut, and Joel Wertheim (leading the charge).

So, what's a 'transitional' genome?
3: To explain, let me introduce you to 'lineage A' and 'lineage B', aka 'clade II' and 'clade I', respectively, in this paper by Zhang et al. These lineages co-circulated in China during the early days of the pandemic, and they differ at two key sites.

nature.com/articles/s4158…
4: Lineage B genomes, such as the reference genome Wuhan/Hu-1, have a 'C' at position 8782 (genomes up to #546 in this alignment). Lineage A genomes have a 'T'.

And at position 28144, Lineage B has 'T' and lineage A has 'C'. At 8782/28144, then, B=C/T and A=T/C.
5: A relatively small number of early genomes, though, have C/C or T/T. They appear to be transitional, because if A evolved from B, or vice versa, you would see a C/C or T/T pattern after one of the two substitutions in that evolutionary journey had occurred.
6: Why do we think they might all be erroneous?

Step back to the fading moments of life-as-you-used-to-know-it, when a man stepped off a plane from China at Sea-Tac airport, on Jan 15, 2020, then became the first in the US to be diagnosed with COVID-19.

nytimes.com/2020/01/21/hea…
7: His #SARSCoV2 genome was rapidly sequenced by CDC and was named 'WA1' - presumably for 'Washington case #1'. Then we waited...and waited...for the other shoe to drop: community spread in the US.
8: It dropped on Feb 29, 2012. In a thread that sent shock-waves through the scientific community, Trevor Bedford at the Hutch reported that a second genome had been recovered from a community-acquired case, and it was eerily close to WA1's.

9: Indeed, it appeared that WA2 had descended directly from WA1. This implied that cryptic community transmission had already been happening for *6 weeks* in the US. Not good! (Though note that Trevor pointed out that the close relationship *could* be a coincidence).
This important work would go on to be published in Science.

science.org/doi/full/10.11…
11: I was initially deeply convinced by this argument. But the more I got the feel for how this virus evolves, the more I thought it was strange that all the genomes in WA State from Feb and beyond had two key substitutions away from WA1.
12: At positions 17747 and 17848, WA1 was C/A and WA2 and all other 'WA outbreak' genomes in Feb, Mar and beyond were T/G. Why the clean separation? If WA1 really did kick things off, why weren't we seeing genomes identical, or at least closer, to it?
13: I knew what was needed: to re-run the epidemic in WA State over and over again to see if the clean, two-nucleotide difference would be observed if WA1 really *had* started the US outbreak.
14: I started looking through the literature for software that would allow one to simulate the Washington State outbreak under realistic epidemiological parameters, then evolve #SARSCoV2 genomes though the infectees.

Came across a package called called FAVITES.
15: To my pleasant surprise, one of the co-authors was my former PhD student, Joel Wertheim. Teamed up with the brilliants Joel, @pekar, @suchard_group and @LemeyLab and showed that WA1 was deeply unlikely to have started the outbreak.
16: The pattern observed in WA State, which is very similar to the lineage A and lineage B pattern in China in early 2020, was just didn't seem consistent with a 'single introduction' scenario. Instead, similar viruses seem to have jumped in twice.
17: We published these findings in the same issue of Science as Trevor's paper:

science.org/doi/full/10.11…
18: But there had been one big mystery. Transitional genomes, with only one of the two diagnostic substitutions, *had* been reported from neighbouring British Columbia (BC). These C/G genomes seemed to undermine the 'two introduction' model (one for WA1, one for WA2).
19: These genomes vexed the hell out of me for weeks.

I suspected they might contain an error at one of the two key sites, but wasn't sure. If they *were* real, then maybe WA1 *did* start off off the whole outbreak. Did he travel through Vancouver and infect people there?
20: Finally, it dawned on me that the genomes themselves might give up the secret of whether they were real or just artefacts due to sequencing issues.

Did the transitional C/G genomes share substitutions (other than the two already mentioned) with T/G genomes like WA2?
21: Check out this figure from our paper. See how the BC genome at the very top shares 4 substitutions with the one at the very bottom? If the top, transitional, one were real, that would mean those very same 4 substitutions had happened independently in the bottom genome.
23: That is like two people, each with their own deck of cards, drawing 4 cards each and finding out that they drew exactly the same cards. Even with a deck of 52 cards, that is near impossible.

But the SARS-CoV-2 deck has ~30,000 cards (one for each nucleotide).
24: It is simply not something likely to happen by chance: it's the 'transitional' virus's 'tell' that it's not transitional after all. It just has an error at site 17747 and is two substitutions different from WA1 after all, like the other Feb-Mar genomes.
25: Leaving us with the strong conclusion that WA1 and WA2 had separate introductions into the US (with the WA outbreak introduction, incidentally, happening a bit later, around Feb 1).
26: Which, at long last (sorry!) brings me back to the supposedly transitional genomes between lineage A and lineage B in China. We see a pattern very similar to the BC (and other) likely-artefactual transitional genomes:

They share substitutions with 'pure' A and B genomes.
27: So some of these are virtually *certain* to be erroneous for one reason or another, and we believe it is unlikely that there are *any* real transitional genomes between A and B.
28: Why is this crucial to understanding how the pandemic started?

If there really are just pure A and B viruses from early in China, the WA1 story teaches us that that *might* be because each lineage had a separate origin from an animal to a human.
29: I'll be the first to admit that I thought the idea of multiple introductions was bonkers when I first encountered it. But we now know that animals like civets and raccoon dogs, were present in Wuhan wet markets, with shared supply chains.

nature.com/articles/s4159…
30: If a human-transmissible SARS-CoV-2 progenitor was circulating among such animals, the SARS1 story teaches us that it would be likely to jump multiple times into humans.

cell.com/cell/pdf/S0092…
31: The possible lack of any real A/B-transitional genomes makes me take the two-intro model much, much more seriously. It's by no means settled, but we have developed the technology to test that hypothesis...we'll see.
32: Important: if lineage A and lineage B had separate origins, then the time their common ancestors existed, and when the 'patient zero' of each lineage was infected, might be considerably later than estimates assuming a single jump, like ours:

science.org/doi/full/10.11…
33: Thanks also to @viralverity @bblarsen1 @Greenbeard2 @swientist and @arambaut, co-authors on the Worobey et al paper.
@viralverity @bblarsen1 @Greenbeard2 @swientist @arambaut 34: And Konrad Scheffler and @niemasd, co-authors on Pekar et al.

@niemasd is the main architect of the invaluable FAVITES software. Basic research on HIV paying off big time during the pandemic.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Michael Worobey

Michael Worobey Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @MichaelWorobey

15 Sep
1: A thread connecting the dots between:

(1) @PeterDaszak et al's fascinating recent preprint on the *many* SARS-related CoV infections in humans per year in Southeast Asia, and

(2) The furin cleavage site of SARS-CoV-2, and

(3) Why Wuhan?

medrxiv.org/content/10.110…
2: The study uses a clever combination of data streams to estimate that 400,000 people per year are being infected by SARS-related coronaviruses.
3: The authors note that not all of those infections are likely to be transmissible human to human

I strongly agree with this. We would see multiple new pandemic origins every year if even 0.1% of these were viable human to human pathogens.
Read 15 tweets
4 Sep
1: I want to follow up the thread below with some additional clarification of why we hypothesize that there may be no real #SARSCoV2 genomes transitional between lineages A and B.

2: @daoyu15 has written a thread asserting that we "toss any genomes that don't fit your conclusions away". I'm afraid this is incorrect on multiple counts.

3: What we show is that many of the putatively transitional genomes bear obvious evidence of being artefacts - probably due to bioinformatic pipelines, rather than sequencing errors per se. (Issues like calling a site with poor coverage to be the base of a reference genome.)
Read 13 tweets
28 Aug
1/4: Good piece from @NPR on the declassified summary of the 90-day intelligence community (IC) review on the origin of #SARSCoV2.

npr.org/2021/08/27/103…
2/4: A little soundbite from me:

[Worobey] would like to see the scientific and intelligence communities collaborate on the problem. "I would hope and assume that this 90-day sprint is going to turn into a nice long jog where there could be some back-and-forth."
3/4 Crucial point US IC elements agree on:

"China’s officials did not have foreknowledge of the virus before the initial outbreak of COVID-19 emerged".

So could we *please* collectively move on from claim that WIV database removal in Sept 2019 was part of a cover-up/conspiracy?
Read 4 tweets
8 Aug
SARS-CoV-2/COVID-19 in Italy in September 2019: the most important finding yet on the origin of the pandemic*.

(*or an error with big consequences.)

A thread. 1/24

papers.ssrn.com/sol3/papers.cf…
The study, led by Dr. Elisabetta Tanzi, also includes heavy-hitters of molecular evolution @sergeilkp and Sudhir Kumar. I greatly admire both but respectfully disagree with their conclusions here and feel it is important to explain why. 2/
Dr. Tanzi led an earlier study claiming to find evidence of SARS-CoV-2 in a boy in Northern Italy who presented with measles symptoms in Nov 2019. 3/

wwwnc.cdc.gov/eid/article/27…
Read 26 tweets
22 Jul
Here I explain why I (continue to) think that a zoonotic origin of SARS-CoV-2 is more likely than a lab leak scenario - even though I signed 'The Science Letter'. 1/
I am a co-author, with @jbloom_lab, @DavidRelman and others, of a widely discussed letter in Science Magazine that argues that both a zoonotic origin and a lab-linked origin are important to consider. 2/
science.sciencemag.org/content/372/65…
I'm also a co-author, with @edwardcholmes, @arambaut, @angie_rasmussen, @stgoldst, @robertson_lab, and others, of a recent preprint that argues that a zoonotic emergence is the more likely scenario. 3/
zenodo.org/record/5075888…
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(