Bloom Lab Profile picture
Jun 22, 2021 26 tweets 12 min read Read on X
In a new study, I identify and recover a deleted set of #SARSCoV2 sequences that provide additional information about viruses from the early Wuhan outbreak: biorxiv.org/content/10.110… (1/n)
Specifically, NIH maintains the Sequence Read Archive, where scientists around world deposit deep sequencing data for others to analyze. I noted peerj.com/articles/9255 lists all #SARSCoV2 data in archive as of March-31-2020. Most from a project by Wuhan University. (2/n) Image
But when I went to Sequence Read Archive, I found entire project was gone! (Note that as detailed below, this does *not* imply malfeasance by NIH. Sequence Read Archive policy allows submitters to delete by e-mail request.) (3/n) Image
I was able to determine deleted data corresponded to a study that partially sequenced “45 nasopharyngeal samples from [Wuhan] outpatients with suspected COVID-19 early in the epidemic“ medrxiv.org/content/10.110… (4/n)
I discovered that even though the files were deleted from archive itself, they could be recovered from the Google Cloud at links like storage.googleapis.com/nih-sequence-r… (5/n)
Using this approach, I recovered files for the 34 early samples that were virus positive. I was able to use the data in the files to reconstruct partial viral sequences (from start of spike to end of ORF10) for 13 of these samples. (6/n)
Now I need to give background to explain a confusing scientific mystery about other early #SARSCoV2 sequences. Although events that led to emergence of #SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats. (7/n)
Therefore, we’d expect the first #SARSCoV2 sequences would be more similar to bat coronaviruses, and as #SARSCoV2 continued to evolve it would become more divergent from these ancestors. But that is *not* the case! (8/n)
Instead, early Huanan Seafood Market #SARSCoV2 viruses are more different from bat coronaviruses than #SARSCoV2 viruses collected later in China and even other countries. @lpipes @ras_nielsen give nice technical analysis at academic.oup.com/mbe/article/38… (9/n)
The conundrum is easily seen by plotting the relative differences from the bat coronavirus RaTG13 outgroup versus collection date for early #SARSCoV2. See how the first reported viruses from Wuhan (leftmost blue points) aren’t the closest to RaTG13. (10/n) Image
Same result if we use other bat coronaviruses like RpYN06 or RmYN02. To see this, go to jbloom.github.io/SARS-CoV-2_PRJ… for an interactive plot that allows you to select the bat coronavirus outgroup and mouse over points for strain details. (11/n)
How do deleted sequences I recovered relate to this conundrum? If we include those sequences, and note 4 sequences from Guangdong are from two groups of people infected in Wuhan in late Dec / early Jan, we get plausible scenarios that resolve above problems. (12/n)
These two scenarios are plotted below. Each has a different “progenitor”, which is the sequence that gave rise to all *currently* known #SARSCoV2 sequences (still may not be virus that infected patient zero if other early sequences remain unknown). (13/n) Image
Both putative progenitors have 3 mutations relative to Seafood Market viruses that make them more similar to bat coronavirus. One is progenitor inferred by @kumar_lab @sergeilkp et al (academic.oup.com/mbe/advance-ar…), other has C8782T, T28144C, and C29095T relative to Wuhan-Hu-1. (14/n)
Both progenitors suggest #SARSCoV2 was circulating in Wuhan before December outbreak at Huanan Seafood Market, which is corroborated by lots of other evidence, including news articles from China in early 2020 (see intro to my paper linked in first Tweet in this thread). (15/n)
There are also broader implications. First, fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples: scmp.com/news/china/soc… (16/n) Image
Sequence sharing could be further limited by fact that scientists in China are under an order from the State Council requiring central approval of all publications: apnews.com/article/united… (17/n) Image
Second major implication is that it may be possible to obtain additional information about early spread of #SARSCoV2 in Wuhan even if efforts for more on-the-ground investigations are stymied. (18/n)
Scientific communication and data sharing typically rely on trust. The NIH Sequence Read Archive has >13,000,000 runs, so they have to trust authors when they request deletions as not feasible to validate reasons for all requests, some of which are legitimate. (19/n)
In case of data set I describe above, it seems possible that trust that the NIH Sequence Read Archive grants to scientific authors to delete data may have been used to obscure sequences informative for understanding early #SARSCoV2. (20/n)
Fortunately, Sequence Read Archive has rigorous data tracking enabling them to determine when data deleted & stated justification by authors. In fact, @NIHDirector @NCBI have already determined this & generously shared info w me, but will let them share more widely. (21/n)
It is important to examine if other trust-based systems in science conceivably may have also been used to hide data relevant to origins / early spread of #SARSCoV2. This includes not only looking more at sequence databases, but also paper reviews, grant reporting, etc. (22/n)
Third major implication is that scientists need to stay focused on data-driven study of #SARSCoV2 origins / early spread. After spending the last 4 months studying this closely, I am cautiously optimistic that additional relevant data are still likely to come to light. (23/n)
We should therefore avoid dogmatic arguments about #SARSCoV2 origins / early spread, and instead focus on following two questions: (1) How can we get more data? (2) How can we better analyze the data we have? (24/n)
Finally, my analysis is on GitHub at github.com/jbloom/SARS-Co… where you can access all code, data, & paper drafts. All updates are via time-stamped commits. This ensures transparency/reproducibility of this study are not in doubt, regardless of your views on interpretation. (25/n)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bloom Lab

Bloom Lab Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jbloom_lab

Mar 5
Over 4 yrs after being first to publicly release SARS-CoV-2 genome, Yong-Zhen Zhang just published large set of viral seqs from first stage of COVID-19 outbreak in China


He uses data to suggest scenarios re early outbreak & root of viral phylogenetic tree academic.oup.com/ve/advance-art…
Image
Zhang recruited nearly all COVID-19 patients hospitalized at Shanghai Public Health Center in first 2/3 (Jan-Sep) of 2020.

The largest source of Shanghai patients in Jan/Feb 2020 was imported cases from Wuhan or elsewhere in Hubei, thereby providing window into Wuhan outbreak. Image
Overall, Zhang obtained 343 near-full-length SARS-CoV-2 sequences from 226 distinct patients, including 133 sequences from samples collected no later than Feb-15-2020.

A phylogenetic tree showing these sequences is below. Image
Read 11 tweets
Feb 7
In new study led by Caleb Carr & @khdcrawford, we measure how all mutation to Lassa virus glycoprotein complex (GPC) affect cell entry & antibody escape

Results show how prospective assessment of effects of mutations can inform design of countermeasures
biorxiv.org/content/10.110…
As background, Lassa virus causes of thousands of deaths each year, mostly from spillovers from its rodent host, but there is occasional human-to-human transmission.

Lassa is biosafety-level-4 priority pathogen, & efforts are underway to develop vaccines & antibody therapeutics.
We used pseudovirus deep mutational scanning to study effects of nearly all 9,820 amino-acid mutations to Lassa’s GPC at biosafety-level-2 by making genotype-phenotype linked libraries of lentiviral pseudotypes
blog.addgene.org/viral-vectors-…
Image
Read 18 tweets
Jan 17
Here is my brief analysis of Dec-28-2019 SARSCoV2 submission to Genbank.

This analysis supports my conclusion to WSJ () that this submission does not tell origin of virus, but does show sequence known to Chinese Academy of Sciences weeks before released wsj.com/politics/natio…
Image
Here is link to my full analysis:

See also images of the same posted below (although it's probably just easier to click on link above and read HTML). github.com/jbloom/SARS2_2…



Image
Image
Image
Image
I also don't think Genbank/NCBI could have reasonably known at time that this sequence was so valuable given that Chinese govt did not announce they had sequence or had submitted it, and Genbank receives vast numbers of submissions.
Read 4 tweets
Dec 17, 2023
In new study led by Frances Welsh, we map how mutations to influenza affect neutralization by antibodies from humans of various ages

We find differences in mutation effects among age groups

Virus has evolved especially to escape antibodies of teenagers

biorxiv.org/content/10.110…
As background, human influenza constantly evolving. So people exposed to different strains, depending on their age & idiosyncratic history of infection/vaccination.

Different exposure histories cause people to make antibodies w different specificities

rupress.org/jem/article/21…
How does this person-to-person heterogeneity in antibody specificity affect influenza evolution?

That’s question we set out to answer

We used deep mutational scanning to measure how H3N2 HA mutations affect neutralization by serum antibodies from children, teenagers, and adults
Read 11 tweets
Nov 29, 2023
I wanted to highlight this pre-print by David Ho’s group on the neutralizing antibody response to new (XBB.1.5-based) COVID vaccine booster, as it illustrates some points related to paradigm of updating SARS-CoV-2 vaccines to keep pace w viral evolution.
biorxiv.org/content/10.110…
Recall original COVID vaccines worked very well against early SARS-CoV-2 strains

Unfortunately, virus has been evolving, so antibodies elicited by that vaccine don’t neutralize newer viral variants very well

(Other human CoVs also evolve same way: ) journals.plos.org/plospathogens/…
Image
So in fall 2022, new booster was made that mixed new (at time) BA.5 variant & original strain. Hope was to boost neutralization of new variants.

Unfortunately, only sort of worked. Titers did go up, but not a relatively greater increase for new variants. Image
Read 15 tweets
Nov 15, 2023
In study led by @bdadonaite we measure how mutations to XBB.1.5 spike affect cell entry, ACE2 binding & serum escape

Two key findings

1⃣ Mutations outside RBD meaningfully affect ACE2 binding

2⃣ Measurements help predict viral clade growth in real worldbiorxiv.org/content/10.110…
We made spike pseudovirus deep mutational scanning libraries of mutations across XBB.1.5 spike (as well as XBB.1.5 RBD and BA.2 spike libraries).

We used these libraries to measure how >9,000 mutations affected cell entry, ACE2 binding, and serum antibody escape. Image
To measure how mutations affected ACE2 binding, we leveraged approach previously used by David Ho & @yunlong_cao groups that is based on fact neutralization of spike-mediated entry by soluble ACE2 is proportional to ACE2 binding. Image
Read 20 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(