Bloom Lab Profile picture
Feb 9, 2022 32 tweets 13 min read Read on X
I'd like to add my preliminary thoughts on a new pre-print by Istvan Csabai & Norbert Solymosi that is receiving a lot of attention because of speculation it might contain new data relevant to origins or early spread of #SARSCoV2 in China: researchsquare.com/article/rs-133… (1/n)
This is actually second pre-print by these authors on topic. I initially heard about first pre-print on Dec-23-2021 from @carlzimmer, who had noticed it: researchsquare.com/article/rs-117… (2/n)
That first pre-print describes analysis of metagenomic samples collected in Antarctica in 2018-2019 that were subsequently sequenced and found to contain #SARSCoV2 reads. The pre-print suggests these might by early #SARSCoV2 reads based on the mutations they contain. (3/n)
After hearing about the pre-print from @carlzimmer, I downloaded the samples and performed the analyses myself. A GitHub repo with my computer code, results, and notes about the analysis / timeline is available here: github.com/jbloom/PRJNA69… (4/n)
My analysis confirmed main findings of first pre-print: some samples did contain #SARSCoV2 reads, with most reads in 3 of 11 samples. In addition, some reads contained three key mutations: C8782T, C18060T, and T28144C, although there is clearly a mixed viral population. (5/n)
Those three mutations are intriguing because they are all "ancestral" mutations that move the sequence *closer* to the bat CoV relatives RaTG13 and BANAL-20-52 relative to first reported Wuhan-Hu-1 sequence from the Huanan Seafood Market. (6/n)
A virus with those three mutations relative to Wuhan-Hu-1 is one of the two plausible progenitors for all currently known human #SARSCoV2 (the other plausible progenitor has C29095T rather than C18060T). See academic.oup.com/mbe/article/38… and academic.oup.com/mbe/article/38… (7/n)
This fact suggests that some sequencing reads come from a virus genetically ancestral to the known sequences from the Huanan Seafood Market, although the stochastic nature of viral mutations means that a more genetically ancestral sequence is not always temporally earlier. (8/n)
In early January, I was contacted by lead author of pre-print, Istvan Csabai. He reached out because the three samples with most #SARSCoV2 reads had just been deleted from @NIH's Sequence Read Archive, reminding him of my paper on deleted sequences: academic.oup.com/mbe/article/38… (9/n)
I confirmed the sequences had been deleted, and archived weblinks showing the original, deleted, and then subsequently restored (see below) pages for the samples are linked in the README in the GitHub repo I created: github.com/jbloom/PRJNA69… (10/n)
Istvan showed me info he had received from Chinese scientists who deposited the sequences. The samples were submitted for sequencing by Sangon Biotech in Dec 2019, & they received results in early 2020. Suggests #SARSCoV2 reads from contamination at Sangon Biotech. (11/n)
I agree this is almost certainly the explanation. Contamination at large-scale sequencing facilities happens, and can be due to index hopping / mis-assignment or physical cross-contamination. In this case, former more likely due to bias towards #SARSCoV2 in read 2. (12/n)
Timeline matters a lot here. According to Chinese scientists, samples submitted in Dec 2019, results received in early 2020. If they were sequenced in Dec 2019 then exceptionally important, because Chinese govt holds #SARSCoV2 not discovered until Dec 30-31... (13/n)
... On other hand, if sequenced in early 2020 then they could be contaminated with some early patient samples and still concord with Chinese govt timeline. Right now it doesn't seem there is enough info to narrow down timeline to distinguish between these. (14/n)
Istvan also explained to me the unfortunate fact that their first pre-print (which is very rigorous and matter-of-fact) was rejected by @biorxivpreprint, which is why they had to post it to Research Square where it got less notice. (15/n)
Shortly thereafter, Istvan & Norbert performed some ingenious further analyses that are basis for their second even more intriguing pre-print researchsquare.com/article/rs-133…, which was unfortunately also rejected by @biorxivpreprint. (16/n)
In their second pre-print, they analyzed *host* reads alongside viral reads and found they came from human, African Green Monkey & hamster. (17/n)
I have *not* yet independently validated these host analyses and so cannot vouch for them, although they appear solid from textual description. Assuming they are correct, the presence of these host reads in the samples is intriguing. (18/n)
Obviously, none of these hosts from Antarctica metagenomes & abundance of host reads parallels abundance of #SARSCoV2 reads. The hosts are interesting: Vero cells are African Green Monkey; CHO cells from hamster, & also hamsters themselves used to study #SARSCoV2 (18/n)
This fact suggests some #SARSCoV2 reads from samples in Vero & hamster cells (or hamsters). Again significance depends on timeline. #SARSCoV2 in Vero cells in Dec 2019 inconsistent with current account of viral origins, but WIV had virus in Vero cells by early to mid Jan (19/n)
Without knowing the sequencing timeline more precisely than the current "December 2019 to early 2020," all we can say is that these samples were contaminated at Sangon Biotech with some early #SARSCoV2 viruses, some of which appear to have been from lab-grown samples. (20/n)
This is obviously super interesting, and I hope further analyses or additional data can shed more light. (21/n)
One postscript: After being deleted from @NIH Sequence Read Archive in early Jan 2022, data were restored later that month. I asked Chinese authors & they did ask to have #SARSCoV2 contaminated samples deleted, but did *not* to have them restored. So unclear what happened (22/n)
To clarify, mutations C8782T, C18060T & T28144C towards RaTG13 / BANAL-20-52 *not* unique to these samples. They are also in some other non-market lineage A viruses (see Tweet 7/n). So suggest a sequence about as ancestral as oldest known ones, but not more ancestral (23/n)
Also to more strongly clarify another point above, these samples are almost certainly contaminated with a *mix* of different #SARSCoV2 samples as indicated by presence of multiple non-fixed viral mutations and reads from multiple host species. (24/n)
Another postscript: this comment @acritschristoph posted on the Antarctica-#SARSCoV2 pre-print makes good points that should also be considered in continued analysis of the data: researchsquare.com/article/rs-133… (25/n)
To follow up on another question, this point by @K_G_Andersen is also relevant (). There is a mix of mutations in these samples, because they are almost certainly contaminated with several different #SARSCoV2-containing samples. (26/n)
Above I called some mutations "ancestral" which I am using to loosely mean mutations likely present in earliest #SARSCoV2 viruses. Eg, @sergeilkp et al propose earliest virus had mutations at 8782, 18060, & 28144 (academic.oup.com/mbe/article/38…), and those are in these samples. (27/n)
Kristian correctly points out some other mutations (eg at 23525) are "derived," which he is using loosely to mean mutations unlikely to be present in earliest #SARSCoV2 viruses. These two observations consistent w idea that we are seeing mutations from a mix of viruses. (28/n)
Overall point is we can't precisely date samples just from mutations. Several reasons: (a) we see mix of mutations, not full sequences, (b) we don't know exact true most "ancestral" sequence, although we can guess it was closer to RaTG13/BANAL-20-52 than later viruses, ... (29/n)
... (c) sequences cannot be dated at high resolution just from mutations due to stochasticity of evolution (). (30/n)
To sum up, given mutations, I think we can be confident these are contaminated w "early viruses" in sense of late 2019 to early 2020, which is also consistent w what authors reported as sequencing timeline. But I doubt more precision than that possible from mutations alone (31/n)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bloom Lab

Bloom Lab Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jbloom_lab

Jan 21
In new study, we find dramatic differences in specificities of serum neutralizing antibodies in infants w single infection by a recent SARS-CoV-2 strain versus adults/children imprinted by an early viral strain.

biorxiv.org/content/10.110…
As background, immune response to a virus is “imprinted” by first exposure, since later exposures to new viral strains often activate pre-existing B-cells.

For SARS-CoV-2, most people globally imprinted by an early viral strain from either vaccination or infection in 2020-2021.
However, small but growing fraction of population has instead been imprinted by more recent viral strain.

Specifically, we compared adults/children imprinted by original vaccine then infected w XBB* strain in 2023 vs infants only infected w XBB* in 2023. Image
Read 9 tweets
Nov 21, 2024
I’ve updated SARSCoV2 antibody-escape calculator w new deep mutational scanning data of @yunlong_cao @jianfcpku

My interpretation: antigenic evolution currently constrained by pleiotropic effects of mutations on RBD-ACE2 affinity, RBD up-down position & antibody neutralization
First, the updated escape calculator is at

As shown below, it is remarkable how much antigenicity of RBD has changed over last 4 yrs. jbloomlab.github.io/SARS2-RBD-esca…Image
Updated data for calculator from this paper by @yunlong_cao’s group (nature.com/articles/s4158…), described in this thread by first author @jianfcpku:
x.com/jianfcpku/stat…

Calculator show how much mutations at each RBD site escape binding by set of neutralizing antibodies
Read 13 tweets
Nov 16, 2024
@Nucleocapsoid @HNimanFC @mrmickme2 @0bFuSc8 @PeacockFlu @CVRHutchinson Good observations. See also this thread posted by @SCOTTeHENSLEY:

I have added a few notes to the bottom of that thread.

To recap here:bsky.app/profile/scotte…
@Nucleocapsoid @HNimanFC @mrmickme2 @0bFuSc8 @PeacockFlu @CVRHutchinson @SCOTTeHENSLEY To add to thread linked above, human British Columbia H5 case has a HA sequence (GISAID EPI_ISL_19548836) that is ambiguous at *both* site Q226 and site E190 (H3 numbering)

Both these sites play an important role in sialic acid binding specificity
@Nucleocapsoid @HNimanFC @mrmickme2 @0bFuSc8 @PeacockFlu @CVRHutchinson @SCOTTeHENSLEY If you are searching literature, these sites are E190 and Q226 in H3 numbering, E186 and Q222 in mature H5 numbering, and E202 and Q238 in sequential H5 numbering (see: )dms-vep.org/Flu_H5_America…
Read 6 tweets
Oct 8, 2024
Below is brief analysis of HA mutations in two recent cases of H5N1 influenza in humans w contact w dairy cattle in California.

Summary is that while virus continues to evolve, nothing about HA mutations in these human cases is obviously alarming. Image
As background, CDC reported several recent cases of H5 influenza in California.

CDC and California DOH recently shared sequences of two of these cases via GISAID.
cdc.gov/media/releases…
California human cases share two HA mutations relative to "consensus" dairy cattle virus HA:

D95G & S336N in H3 numbering (D88G & S320N in H5 numbering; D014G & S336N in sequential numbering).

Both these mutations also in some dairy cattle HAs, so not unique to human cases. Image
Read 10 tweets
Sep 15, 2024
Here is analysis of HA mutations in H5 influenza case in Missouri resident without known contact w animals or raw milk.

TLDR: there is one HA mutation that strongly affects antigenicity, and another that merits some further study.
As background, CDC recently released partial sequence of A/Missouri/121/2024, which is virus from person in Missouri who was infected with H5 influenza.


Here I am analyzing HA protein from this release, GISAID accession EPI_ISL_19413343cdc.gov/bird-flu/spotl…
Sequence covers all of HA except signal peptide, and residues 325-351 (sequential numbering) / 312-335 (H3 numbering). The missing residues encompass HA1-HA2 boundary, and any missed mutations there unlikely to affect antigenicity or receptor binding, but could affect stability.
Read 16 tweets
May 25, 2024
In new study led by @bdadonaite, we measure how all mutations to H5 influenza HA affect four molecular phenotypes relevant to pandemic risk:


Results can inform surveillance of ongoing evolution of H5N1. biorxiv.org/content/10.110…
Image
To measure how all HA mutations affect those phenotypes, we created pseudovirus libraries of HA from WHO clade 2.3.4.4b vaccine strain.

Pseudoviruses encode no genes other than HA, so can only do a single cycle of infection making them safe for biosafety-level-2. Image
First, we measured how all mutations affected HA-mediated cell entry, which is essential for viral fitness

See heatmap below, which is easily visualized interactively at

Some sites constrained (orange); others w many well tolerated mutations (white/blue) dms-vep.org/Flu_H5_America…
Image
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(