I have posted an updated version of my pre-print describing #SARSCoV2 sequences from the early Wuhan epidemic that were deleted from the Sequence Read Archive. This revision should clarify some key questions people asked about the original version: biorxiv.org/content/10.110… (1/n)
First, I would like to thank @stgoldst who provided a set of good-faith scientific critiques that he posted as @biorxivpreprint comments on the original version: disq.us/p/2hwabcu (2/n)
My revisions address @stgoldst's comments as well as others posted on @biorxivpreprint or e-mailed to me directly. You can read my detailed response to the comments and description of the revisions here (disq.us/p/2hwapg6). In this thread, I summarize key changes. (3/n)
First, there has been speculation about *why* the sequences were deleted. Subsequent to submission of the original pre-print, I received a redacted copy of the deletion request e-mail, which I have added as Figure 6 of the revised manuscript (and also pasted below). (4/n)
In the revised manuscript, I refer to this e-mail & make clear that I can't determine the authors' motives. However, I do note that I could not find any websites with updated data, & that practical consequence of deletion was that no one was aware the data existed. (5/n)
Second, @stgoldst correctly noted mutations were still listed in paper published in Small (onlinelibrary.wiley.com/doi/full/10.10…). I now explain this clearly. However, again no one noticed mutation list in paper, so practical consequence of SRA deletion was to obscure existence of data. (6/n)
Third, @stgoldst pointed out that my prior use of example e-mail from pangolin coronavirus SRA deletion request was confusing, & could make readers incorrectly think there was connection between the deletion requests. This is valid point, so I have removed that example. (7/n)
Fourth, several people were confused by my mention of two theories I thought unlikely (RaTG13 outgroup faking, & 2-market hypothesis). Therefore, I have shortened this paragraph, although I still think 2-market hypothesis unlikely for reason here:
Fifth, a number of people have noted new data are not transformative, since others have previously inferred earliest viruses aren't from market (eg, Kumar @sergeilkp et al: academic.oup.com/mbe/advance-ar…). I fully agree, and cited this paper heavily... (9/n)
... Now I emphasize even more that new data informative but not transformative, & support *existing* idea virus didn't originate at market. I think study getting so much attention because people are hungry for *any* data on early Wuhan #SARSCoV2, even if not transformative (10/n)
Sixth, in original pre-print I could *not* recover two deleted runs from Google Cloud. After pre-print, several people discovered archived copies of these runs they downloaded before June 2020. I added analysis of these two runs, but they do not meaningfully change results (11/n)
Seventh, @bblarsen1 pointed out that I was failing to properly mask primer binding sites in original analysis. I have fixed this. It does not change results, but does slightly reduce fractional coverage on recovered sequences. (12/n)
Again, a detailed summary of my revisions is here (disq.us/p/2hwapg6) and GitHub repo that transparently tracks all time-stamped changes to manuscript and code over entire history of project is here: github.com/jbloom/SARS-Co… (13/n)
I thank @stgoldst & everyone who took time to provide critiques. Origins & early spread of #SARSCoV2 such a contentious topic that many people have decided they want no part of it. So everybody still engaging in good-faith scientific discussion of topic has my respect! (14/n)
Finally, as scientists, we need to continue to do our best to avoid political / opinion-based arguments about #SARSCoV2 origins / early spread, and remain focused on following two questions: (1) How can we get more data? (2) How can we better analyze the data we have? (15/n)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I am getting lots of questions if my pre-print about some #SARSCoV2 sequences that were removed from Sequence Read Archive tell us anything about lab accident versus natural zoonosis.
I posted summary of pre-print below, but did not directly address this point explicitly (1/n)
The answer is NO. The people using it to strongly support either argument are those that have become so emotionally invested in their opinion that they have lost the ability to analyze anything objectively outside of the framework of that argument. (2/n)
What the pre-print does imply is as follows:
First, there may be additional relevant data in obscure locations that aren't the places where we are accustomed to looking (e.g., on the Google Cloud, in table 1 of a paper on diagnostics, etc):
It turns out that mention of the sequencing project in question (PRJNA612766) also disappeared from China National GeneBank (CNGB) shortly after it was removed from the NIH Sequence Read Archive. (2/n)
In a new study, I identify and recover a deleted set of #SARSCoV2 sequences that provide additional information about viruses from the early Wuhan outbreak: biorxiv.org/content/10.110… (1/n)
Specifically, NIH maintains the Sequence Read Archive, where scientists around world deposit deep sequencing data for others to analyze. I noted peerj.com/articles/9255 lists all #SARSCoV2 data in archive as of March-31-2020. Most from a project by Wuhan University. (2/n)
But when I went to Sequence Read Archive, I found entire project was gone! (Note that as detailed below, this does *not* imply malfeasance by NIH. Sequence Read Archive policy allows submitters to delete by e-mail request.) (3/n)
In a new study led by the group of Hui-Ling Yen, we help define the transmission potential of flu viruses replicating in the upper and lower respiratory tract, and quantify the rate of mixing between viruses in these two locations: academic.oup.com/jid/advance-ar… (1/n)
Specifically, our results show that at least in ferrets, viruses transmit from the upper rather than lower respiratory tract, and there is only limited slow mixing between viral populations in these two anatomical locations. (2/n)
Specifically, we generated pdmH1N1 viruses that were either "wildtype" or had 2 synonymous mutations that served as neutral genetic markers. Ferrets were then inoculated with one of these viral variants in the upper airway, and one in the lower airway. (3/n)
In letter published in @ScienceMagazine today, I join 17 other scientists in calling for further investigation of #SARSCoV2 origins, including objective consideration of both accidental lab leak and natural zoonosis: science.sciencemag.org/lookup/doi/10.… (1/n)
We note the scientific community has made admirable progress in understanding biology of #SARSCoV2, including developing vaccines & other countermeasures. But more investigation needed to determine origin of pandemic, which is critical to mitigating risk of future outbreaks (3/n)
We've written a perspective on a new study by @MAMdayIndayOut that helps explain why some viruses (measles) don't evolve to escape immunity but others (influenza) do. Provides some clues relevant to future for #SARSCoV2 as well: cell.com/cell-reports-m…
Here is a recap: (1/n)
Measles and influenza are both respiratory RNA viruses with high mutation rates. Immunity to measles is lifelong: before vaccines, people were infected just once in their lives. Then a measles vaccine was developed >50 years ago and it still works great today. (2/n)
Unfortunately, same is not true for influenza. Typical person is re-infected with same subtype of influenza every 5-7 yrs. Importantly, influenza re-infections are *not* because immunity is weak or transient. We know this from the 1977 flu pandemic. (3/n)