Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Bloom Lab

@jbloom_lab

Feb 9, 2022 • 32 tweets • 13 min read • Read on X

Scrolly

I'd like to add my preliminary thoughts on a new pre-print by Istvan Csabai & Norbert Solymosi that is receiving a lot of attention because of speculation it might contain new data relevant to origins or early spread of #SARSCoV2 in China: researchsquare.com/article/rs-133… (1/n)

@carlzimmer

This is actually second pre-print by these authors on topic. I initially heard about first pre-print on Dec-23-2021 from @carlzimmer, who had noticed it: researchsquare.com/article/rs-117… (2/n)

That first pre-print describes analysis of metagenomic samples collected in Antarctica in 2018-2019 that were subsequently sequenced and found to contain #SARSCoV2 reads. The pre-print suggests these might by early #SARSCoV2 reads based on the mutations they contain. (3/n)

@carlzimmer

After hearing about the pre-print from @carlzimmer, I downloaded the samples and performed the analyses myself. A GitHub repo with my computer code, results, and notes about the analysis / timeline is available here: github.com/jbloom/PRJNA69… (4/n)

My analysis confirmed main findings of first pre-print: some samples did contain #SARSCoV2 reads, with most reads in 3 of 11 samples. In addition, some reads contained three key mutations: C8782T, C18060T, and T28144C, although there is clearly a mixed viral population. (5/n)

Those three mutations are intriguing because they are all "ancestral" mutations that move the sequence *closer* to the bat CoV relatives RaTG13 and BANAL-20-52 relative to first reported Wuhan-Hu-1 sequence from the Huanan Seafood Market. (6/n)

A virus with those three mutations relative to Wuhan-Hu-1 is one of the two plausible progenitors for all currently known human #SARSCoV2 (the other plausible progenitor has C29095T rather than C18060T). See academic.oup.com/mbe/article/38… and academic.oup.com/mbe/article/38… (7/n)

This fact suggests that some sequencing reads come from a virus genetically ancestral to the known sequences from the Huanan Seafood Market, although the stochastic nature of viral mutations means that a more genetically ancestral sequence is not always temporally earlier. (8/n)

@NIH

In early January, I was contacted by lead author of pre-print, Istvan Csabai. He reached out because the three samples with most #SARSCoV2 reads had just been deleted from @NIH's Sequence Read Archive, reminding him of my paper on deleted sequences: academic.oup.com/mbe/article/38… (9/n)

I confirmed the sequences had been deleted, and archived weblinks showing the original, deleted, and then subsequently restored (see below) pages for the samples are linked in the README in the GitHub repo I created: github.com/jbloom/PRJNA69… (10/n)

Istvan showed me info he had received from Chinese scientists who deposited the sequences. The samples were submitted for sequencing by Sangon Biotech in Dec 2019, & they received results in early 2020. Suggests #SARSCoV2 reads from contamination at Sangon Biotech. (11/n)

I agree this is almost certainly the explanation. Contamination at large-scale sequencing facilities happens, and can be due to index hopping / mis-assignment or physical cross-contamination. In this case, former more likely due to bias towards #SARSCoV2 in read 2. (12/n)

Timeline matters a lot here. According to Chinese scientists, samples submitted in Dec 2019, results received in early 2020. If they were sequenced in Dec 2019 then exceptionally important, because Chinese govt holds #SARSCoV2 not discovered until Dec 30-31... (13/n)

... On other hand, if sequenced in early 2020 then they could be contaminated with some early patient samples and still concord with Chinese govt timeline. Right now it doesn't seem there is enough info to narrow down timeline to distinguish between these. (14/n)

@biorxivpreprint

Istvan also explained to me the unfortunate fact that their first pre-print (which is very rigorous and matter-of-fact) was rejected by @biorxivpreprint, which is why they had to post it to Research Square where it got less notice. (15/n)

@biorxivpreprint

Shortly thereafter, Istvan & Norbert performed some ingenious further analyses that are basis for their second even more intriguing pre-print researchsquare.com/article/rs-133…, which was unfortunately also rejected by @biorxivpreprint. (16/n)

In their second pre-print, they analyzed *host* reads alongside viral reads and found they came from human, African Green Monkey & hamster. (17/n)

I have *not* yet independently validated these host analyses and so cannot vouch for them, although they appear solid from textual description. Assuming they are correct, the presence of these host reads in the samples is intriguing. (18/n)

Obviously, none of these hosts from Antarctica metagenomes & abundance of host reads parallels abundance of #SARSCoV2 reads. The hosts are interesting: Vero cells are African Green Monkey; CHO cells from hamster, & also hamsters themselves used to study #SARSCoV2 (18/n)

This fact suggests some #SARSCoV2 reads from samples in Vero & hamster cells (or hamsters). Again significance depends on timeline. #SARSCoV2 in Vero cells in Dec 2019 inconsistent with current account of viral origins, but WIV had virus in Vero cells by early to mid Jan (19/n)

Without knowing the sequencing timeline more precisely than the current "December 2019 to early 2020," all we can say is that these samples were contaminated at Sangon Biotech with some early #SARSCoV2 viruses, some of which appear to have been from lab-grown samples. (20/n)

This is obviously super interesting, and I hope further analyses or additional data can shed more light. (21/n)

@NIH

One postscript: After being deleted from @NIH Sequence Read Archive in early Jan 2022, data were restored later that month. I asked Chinese authors & they did ask to have #SARSCoV2 contaminated samples deleted, but did *not* to have them restored. So unclear what happened (22/n)

To clarify, mutations C8782T, C18060T & T28144C towards RaTG13 / BANAL-20-52 *not* unique to these samples. They are also in some other non-market lineage A viruses (see Tweet 7/n). So suggest a sequence about as ancestral as oldest known ones, but not more ancestral (23/n)

Also to more strongly clarify another point above, these samples are almost certainly contaminated with a *mix* of different #SARSCoV2 samples as indicated by presence of multiple non-fixed viral mutations and reads from multiple host species. (24/n)

@acritschristoph

Another postscript: this comment @acritschristoph posted on the Antarctica-#SARSCoV2 pre-print makes good points that should also be considered in continued analysis of the data: researchsquare.com/article/rs-133… (25/n)

@K_G_Andersen

To follow up on another question, this point by @K_G_Andersen is also relevant (

https://twitter.com/K_G_Andersen/status/1491617642955755524

). There is a mix of mutations in these samples, because they are almost certainly contaminated with several different #SARSCoV2-containing samples. (26/n)

@sergeilkp

Above I called some mutations "ancestral" which I am using to loosely mean mutations likely present in earliest #SARSCoV2 viruses. Eg, @sergeilkp et al propose earliest virus had mutations at 8782, 18060, & 28144 (academic.oup.com/mbe/article/38…), and those are in these samples. (27/n)

Kristian correctly points out some other mutations (eg at 23525) are "derived," which he is using loosely to mean mutations unlikely to be present in earliest #SARSCoV2 viruses. These two observations consistent w idea that we are seeing mutations from a mix of viruses. (28/n)

Overall point is we can't precisely date samples just from mutations. Several reasons: (a) we see mix of mutations, not full sequences, (b) we don't know exact true most "ancestral" sequence, although we can guess it was closer to RaTG13/BANAL-20-52 than later viruses, ... (29/n)

https://twitter.com/jbloom_lab/status/1492162895769006084

... (c) sequences cannot be dated at high resolution just from mutations due to stochasticity of evolution (

https://twitter.com/jbloom_lab/status/1492162895769006084

). (30/n)

To sum up, given mutations, I think we can be confident these are contaminated w "early viruses" in sense of late 2019 to early 2020, which is also consistent w what authors reported as sequencing timeline. But I doubt more precision than that possible from mutations alone (31/n)

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Bloom Lab

Try unrolling a thread yourself!

More from @jbloom_lab

Bloom Lab

Bloom Lab

Bloom Lab

Bloom Lab

Bloom Lab

Bloom Lab

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!