Bloom Lab Profile picture
Jun 22, 2021 26 tweets 12 min read Read on X
In a new study, I identify and recover a deleted set of #SARSCoV2 sequences that provide additional information about viruses from the early Wuhan outbreak: biorxiv.org/content/10.110… (1/n)
Specifically, NIH maintains the Sequence Read Archive, where scientists around world deposit deep sequencing data for others to analyze. I noted peerj.com/articles/9255 lists all #SARSCoV2 data in archive as of March-31-2020. Most from a project by Wuhan University. (2/n) Image
But when I went to Sequence Read Archive, I found entire project was gone! (Note that as detailed below, this does *not* imply malfeasance by NIH. Sequence Read Archive policy allows submitters to delete by e-mail request.) (3/n) Image
I was able to determine deleted data corresponded to a study that partially sequenced “45 nasopharyngeal samples from [Wuhan] outpatients with suspected COVID-19 early in the epidemic“ medrxiv.org/content/10.110… (4/n)
I discovered that even though the files were deleted from archive itself, they could be recovered from the Google Cloud at links like storage.googleapis.com/nih-sequence-r… (5/n)
Using this approach, I recovered files for the 34 early samples that were virus positive. I was able to use the data in the files to reconstruct partial viral sequences (from start of spike to end of ORF10) for 13 of these samples. (6/n)
Now I need to give background to explain a confusing scientific mystery about other early #SARSCoV2 sequences. Although events that led to emergence of #SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats. (7/n)
Therefore, we’d expect the first #SARSCoV2 sequences would be more similar to bat coronaviruses, and as #SARSCoV2 continued to evolve it would become more divergent from these ancestors. But that is *not* the case! (8/n)
Instead, early Huanan Seafood Market #SARSCoV2 viruses are more different from bat coronaviruses than #SARSCoV2 viruses collected later in China and even other countries. @lpipes @ras_nielsen give nice technical analysis at academic.oup.com/mbe/article/38… (9/n)
The conundrum is easily seen by plotting the relative differences from the bat coronavirus RaTG13 outgroup versus collection date for early #SARSCoV2. See how the first reported viruses from Wuhan (leftmost blue points) aren’t the closest to RaTG13. (10/n) Image
Same result if we use other bat coronaviruses like RpYN06 or RmYN02. To see this, go to jbloom.github.io/SARS-CoV-2_PRJ… for an interactive plot that allows you to select the bat coronavirus outgroup and mouse over points for strain details. (11/n)
How do deleted sequences I recovered relate to this conundrum? If we include those sequences, and note 4 sequences from Guangdong are from two groups of people infected in Wuhan in late Dec / early Jan, we get plausible scenarios that resolve above problems. (12/n)
These two scenarios are plotted below. Each has a different “progenitor”, which is the sequence that gave rise to all *currently* known #SARSCoV2 sequences (still may not be virus that infected patient zero if other early sequences remain unknown). (13/n) Image
Both putative progenitors have 3 mutations relative to Seafood Market viruses that make them more similar to bat coronavirus. One is progenitor inferred by @kumar_lab @sergeilkp et al (academic.oup.com/mbe/advance-ar…), other has C8782T, T28144C, and C29095T relative to Wuhan-Hu-1. (14/n)
Both progenitors suggest #SARSCoV2 was circulating in Wuhan before December outbreak at Huanan Seafood Market, which is corroborated by lots of other evidence, including news articles from China in early 2020 (see intro to my paper linked in first Tweet in this thread). (15/n)
There are also broader implications. First, fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples: scmp.com/news/china/soc… (16/n) Image
Sequence sharing could be further limited by fact that scientists in China are under an order from the State Council requiring central approval of all publications: apnews.com/article/united… (17/n) Image
Second major implication is that it may be possible to obtain additional information about early spread of #SARSCoV2 in Wuhan even if efforts for more on-the-ground investigations are stymied. (18/n)
Scientific communication and data sharing typically rely on trust. The NIH Sequence Read Archive has >13,000,000 runs, so they have to trust authors when they request deletions as not feasible to validate reasons for all requests, some of which are legitimate. (19/n)
In case of data set I describe above, it seems possible that trust that the NIH Sequence Read Archive grants to scientific authors to delete data may have been used to obscure sequences informative for understanding early #SARSCoV2. (20/n)
Fortunately, Sequence Read Archive has rigorous data tracking enabling them to determine when data deleted & stated justification by authors. In fact, @NIHDirector @NCBI have already determined this & generously shared info w me, but will let them share more widely. (21/n)
It is important to examine if other trust-based systems in science conceivably may have also been used to hide data relevant to origins / early spread of #SARSCoV2. This includes not only looking more at sequence databases, but also paper reviews, grant reporting, etc. (22/n)
Third major implication is that scientists need to stay focused on data-driven study of #SARSCoV2 origins / early spread. After spending the last 4 months studying this closely, I am cautiously optimistic that additional relevant data are still likely to come to light. (23/n)
We should therefore avoid dogmatic arguments about #SARSCoV2 origins / early spread, and instead focus on following two questions: (1) How can we get more data? (2) How can we better analyze the data we have? (24/n)
Finally, my analysis is on GitHub at github.com/jbloom/SARS-Co… where you can access all code, data, & paper drafts. All updates are via time-stamped commits. This ensures transparency/reproducibility of this study are not in doubt, regardless of your views on interpretation. (25/n)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bloom Lab

Bloom Lab Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jbloom_lab

Nov 21
I’ve updated SARSCoV2 antibody-escape calculator w new deep mutational scanning data of @yunlong_cao @jianfcpku

My interpretation: antigenic evolution currently constrained by pleiotropic effects of mutations on RBD-ACE2 affinity, RBD up-down position & antibody neutralization
First, the updated escape calculator is at

As shown below, it is remarkable how much antigenicity of RBD has changed over last 4 yrs. jbloomlab.github.io/SARS2-RBD-esca…Image
Updated data for calculator from this paper by @yunlong_cao’s group (nature.com/articles/s4158…), described in this thread by first author @jianfcpku:
x.com/jianfcpku/stat…

Calculator show how much mutations at each RBD site escape binding by set of neutralizing antibodies
Read 13 tweets
Nov 16
@Nucleocapsoid @HNimanFC @mrmickme2 @0bFuSc8 @PeacockFlu @CVRHutchinson Good observations. See also this thread posted by @SCOTTeHENSLEY:

I have added a few notes to the bottom of that thread.

To recap here:bsky.app/profile/scotte…
@Nucleocapsoid @HNimanFC @mrmickme2 @0bFuSc8 @PeacockFlu @CVRHutchinson @SCOTTeHENSLEY To add to thread linked above, human British Columbia H5 case has a HA sequence (GISAID EPI_ISL_19548836) that is ambiguous at *both* site Q226 and site E190 (H3 numbering)

Both these sites play an important role in sialic acid binding specificity
@Nucleocapsoid @HNimanFC @mrmickme2 @0bFuSc8 @PeacockFlu @CVRHutchinson @SCOTTeHENSLEY If you are searching literature, these sites are E190 and Q226 in H3 numbering, E186 and Q222 in mature H5 numbering, and E202 and Q238 in sequential H5 numbering (see: )dms-vep.org/Flu_H5_America…
Read 6 tweets
Oct 8
Below is brief analysis of HA mutations in two recent cases of H5N1 influenza in humans w contact w dairy cattle in California.

Summary is that while virus continues to evolve, nothing about HA mutations in these human cases is obviously alarming. Image
As background, CDC reported several recent cases of H5 influenza in California.

CDC and California DOH recently shared sequences of two of these cases via GISAID.
cdc.gov/media/releases…
California human cases share two HA mutations relative to "consensus" dairy cattle virus HA:

D95G & S336N in H3 numbering (D88G & S320N in H5 numbering; D014G & S336N in sequential numbering).

Both these mutations also in some dairy cattle HAs, so not unique to human cases. Image
Read 10 tweets
Sep 15
Here is analysis of HA mutations in H5 influenza case in Missouri resident without known contact w animals or raw milk.

TLDR: there is one HA mutation that strongly affects antigenicity, and another that merits some further study.
As background, CDC recently released partial sequence of A/Missouri/121/2024, which is virus from person in Missouri who was infected with H5 influenza.


Here I am analyzing HA protein from this release, GISAID accession EPI_ISL_19413343cdc.gov/bird-flu/spotl…
Sequence covers all of HA except signal peptide, and residues 325-351 (sequential numbering) / 312-335 (H3 numbering). The missing residues encompass HA1-HA2 boundary, and any missed mutations there unlikely to affect antigenicity or receptor binding, but could affect stability.
Read 16 tweets
May 25
In new study led by @bdadonaite, we measure how all mutations to H5 influenza HA affect four molecular phenotypes relevant to pandemic risk:


Results can inform surveillance of ongoing evolution of H5N1. biorxiv.org/content/10.110…
Image
To measure how all HA mutations affect those phenotypes, we created pseudovirus libraries of HA from WHO clade 2.3.4.4b vaccine strain.

Pseudoviruses encode no genes other than HA, so can only do a single cycle of infection making them safe for biosafety-level-2. Image
First, we measured how all mutations affected HA-mediated cell entry, which is essential for viral fitness

See heatmap below, which is easily visualized interactively at

Some sites constrained (orange); others w many well tolerated mutations (white/blue) dms-vep.org/Flu_H5_America…
Image
Read 15 tweets
Apr 20
In new study led by @bblarsen1 in collab w @veeslerlab @VUMC_Vaccines we map functional & antigenic landscape of Nipah virus receptor binding protein (RBP)


Results elucidate constraints on RBP function & provide insight re protein’s evolutionary potentialbiorxiv.org/content/10.110…
Nipah is bat virus that sporadically infects humans w high (~70%) fatality rate. Has been limited human transmission

Like other paramyxoviruses, Nipah uses two proteins to enter cells: RBP binds receptor & then triggers fusion (F) protein by process that is not fully understood
RBP forms tetramer in which 4 constituent monomers (which are all identical in sequence) adopt 3 distinct conformations

RBP binds to two receptors, EFNB2 & EFNB3

RBP’s affinity for EFNB2 is very high (~0.1 nM, over an order of magnitude higher than SARSCoV2’s affinity for ACE2) Image
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(