Bloom Lab Profile picture
Apr 27, 2023 57 tweets 19 min read Read on X
In new study, I have analyzed correlation between SARS-CoV-2 & animal genetic material in full set of environmental samples from Huanan Seafood Market.


Analysis clarifies what sequencing these samples can & cannot tell us about early outbreak at market.biorxiv.org/content/10.110…
Background:

China first reported coronavirus cases associated w market & no human transmission. But then we learned was human transmission & some early cases not from market.

Thus began still unanswered questions of role of market, nicely summarized here science.org/content/articl…
In 2022, Chinese CDC released pre-print describing sampling market beginning Jan-1-2020:

They collected 457 animal & 923 environmental samples. They stated all animal samples tested negative, but 73 environmental samples positive.researchsquare.com/article/rs-137…
In their 2022 pre-print, Chinese CDC described deep sequencing >150 environmental samples.

They included plot (below) that showed SARS-CoV-2 content of samples was correlated w human genetic material. From this, they concluded humans were source of virus in samples at market. Image
But Chinese CDC did not label other points on plot, so it wasn’t clear what other species had genetic material that correlated with SARS-CoV-2 abundance. They also didn’t provide raw sequencing data to enable other scientists to do this analysis.
This omission was widely noted in multiple news articles where scientists requested access to raw data to analyze which species correlated w SARS-CoV-2 ( & ).science.org/content/articl…
Eventually, Chinese CDC uploaded some of the raw sequencing files to GISAID, where they were downloaded by another group of scientists who started analyzing data.
Before any written analysis posted, media started reporting that data suggested raccoon dogs may have been infected at market because their genetic material was co-mingled w SARS-CoV-2 in environmental samples ( & )nytimes.com/2023/03/16/sci…
theatlantic.com/science/archiv…
The next week, the scientists reported their initial analysis, @critschristoph et al, of environmental samples:

Crits-Christoph et al reported some samples contained genetic material from raccoon dogs & other susceptible animal species (like bamboo rats).zenodo.org/record/7754299…
Analysis by Crits-Christoph et al therefore genetically confirmed prior reports that species like raccoon dogs & bamboo rats were present at market.

The genetic details could inform tracing supply of these animals, which is important to investigate.
However, Crits-Christoph et al did not report analysis of SARS-CoV-2 content of samples: they just used Chinese CDC classification of whether samples were “positive”.

So their analysis did not identify which species have genetic material correlated w viral material.
Next week the Chinese CDC uploaded revised version of their pre-print, which shortly thereafter was published in Nature (). They also made all raw sequencing data available on public databases like SRA and NGDC.nature.com/articles/s4158…
New Chinese CDC paper agreed some samples had material from raccoon dogs & other species, although they did metagenomics differently than Crits-Christoph et al (probably not as well).

But they also did not analyze correlation of SARS-CoV-2 & animal genetic material in samples.
In fact, Chinese CDC even removed their earlier incompletely labeled SARS-CoV-2 vs species correlation plot that started all the questions.

So there still hasn’t been any analysis of how SARS-CoV-2 genetic material correlates w that of other animals!
My new analysis addresses what animal genetic material correlates w SARS-CoV-2.

To do this, I wrote fully reproducible computational pipeline that downloads all raw sequencing data (which exceeds 3 terabytes) from NGDC database: github.com/jbloom/Huanan_…
First, I confirmed data Chinese CDC uploaded to NGDC is superset of data analyzed by Crits-Christoph.

See SHA-512 file hashes:

So regardless of earlier controversy about data access, unmodified versions of all files now publicly available.github.com/jbloom/Huanan_…
I then aligned sequencing reads to concatenated reference of SARS-CoV-2 & chordate mitochondrial genomes.

This enabled me to quantify both how much SARS-CoV-2 & mitochondrial genetic material from each species is in each sample.
I correlated species compositions from my analysis with compositions reported by Crits-Christoph et al (which are only for mammals).

Results highly correlated (see figure below). This is good: two independent analyses get similar results for species compositions of samples. Image
But note composition depends on reference set. I use chordates; Crits-Christoph et al report compositions for mammals.

Below are mitochondrial compositions for sample Q61: raccoon dog most abundant mammal, duck most abundant chordate. Image
Different if you align contigs to full genomes: then more raccoon dog than duck (Fig 1B of my pre-print)

There isn’t one correct way. Here I use mitochondrial composition to be consistent w Crits-Christoph et al report & because some species don’t have full genomes available.
You can go to to interactively look up mitochondrial species composition for any sample.jbloom.github.io/Huanan_market_…
Now for new part: what is SARS-CoV-2 content of samples?

Below is plot of percent of reads mapping to SARS-CoV-2 for each sample.

Most samples have little or no SARS-CoV-2. Samples with most SARS-CoV-2 have mitochondrial material mostly from fish. Image
Many ways to partition samples: date collected, etc. Interactive plot at allows you to do that.

Eg, if we only look at later sampling dates, sample w most SARS-CoV-2 dominated by rat snake, dove, & human mitochondrial material.jbloom.github.io/Huanan_market_…
What about samples w lots of mitochondrial material from susceptible non-human animals like raccoon dogs?

Below is table of SARS-CoV-2 content of all samples with >20% of their chordate mitochondrial material from a susceptible non-human species. Image
There are 14 samples w >20% chordate mitochondrial material from raccoon dog: 13 have no SARS2 reads, other has 1 in ~200,000,000 reads mapping to SARS2

0 of 6 samples w >20% bamboo rat material have SARS2 reads

1 sample each w Malayan porcupine & Amur hedgehog have SARS2 reads
We can correlate number of SARS-CoV-2 reads w mitochondrial reads for each species across all samples (below).

Highest correlation for largemouth bass, catfish, cow, carp, snakehead fish

Humans modestly correlated w SARS2 reads

Raccoon dogs negatively correlated w SARS2 reads Image
There are many ways to subset data on sampling dates, calculate correlation, etc.

Interactive plots at & let you see how correlations change when you do that.jbloom.github.io/Huanan_market_…
jbloom.github.io/Huanan_market_…
Finally, we can circle back to question scientists asked when Chinese CDC first posted pre-print in 2022: what if you label other species in correlation plot?

Below is that plot, shown only for samples w at least one SARS2 read for consistency w original Chinese CDC figure. Image
Species most correlated w SARS2 are fish & livestock, followed by humans. Raccoon dogs & bamboo rats negatively correlated w SARS2.

Similar if we only look at samples collected on Jan-12-2020, which was date of most intense wildlife stall sampling.
Again, lots of ways to subset samples & calculate correlations, & you can explore them using the interactive plots at jbloom.github.io/Huanan_market_…
So how did we end up w media articles about raccoon dog material co-mingled w SARS2?

Raccoon dogs are one of species least co-mingled w SARS2, and Q61 raccoon dog sample only has 1 of 200,000,000 reads mapping to SARS2.

May have to do w how Chinese CDC called sample positivity Image
In their pre-print/paper, Chinese CDC called “positive” any sample that either tested positive by RT-qPCR or had >0 sequencing reads mapping to SARS-CoV-2.

But these environmental samples: they mix various animal and/or viral sequences (plus probably index hopping in sequencing)
Criteria Chinese CDC used to call positivity aren’t consistent.

Eg, Q61 was negative by RT-qPCR, but has 1 of 200,000,000 reads mapping to SARS2. Not consistent to call Q61 positive but call negative other samples that also tested negative by RT-qPCR & were never sequenced. Image
Therefore, I suggest future work should stop using “positive” / “negative” classification of Chinese CDC table, & instead analyze quantitative SARS2 content only across samples subjected to same consistent set of assays (eg, Ct values or SARS2 read content).
More broadly, what can we conclude about COVID-19 origins from all this?

Probably not much.

@DrTedros of @WHO had correct interpretation: we should analyze everything, but these data don’t tell us how pandemic began
Recall market samples were collected on Jan-1-2020 or later.

First human SARS2 infections in Wuhan occurred no later than Nov 2019.

By Jan 2022, SARS2 had been spread widely across market by humans, regardless of how it originated.
Viral material is most co-mingled w material from fish & livestock products, but virus clearly did NOT originate w those species & products.

It’s simply that environmental samples taken over month after humans started spreading virus do not reliably indicate outbreak origin.
If we ever learn origin of SARS2, I suspect it will come from information on events that occurred in Nov 2019 (or earlier):

Until then, we should analyze all available data--but be circumspect & cognizant of limits of these data & our knowledge.
Finally, interactive versions of all plots from my analysis are at

Computer code is at

Pre-print is at

I hope others explore & build on this computer code w further analyses.jbloom.github.io/Huanan_market_…
github.com/jbloom/Huanan_…
biorxiv.org/content/10.110…
In response to discussion (), I don't think analysis disproves or proves source

Just emphasizes these samples collected too late to reveal origin

We need info on earlier events

Unless we get that, we need to acknowledge don't know exactly what happened
I was getting questions re 20% cutoff used to decide which samples to show in Table 1 of pre-print.

Analysis is of all samples, 20% cutoff is just to make Table 1 manageable in size.

For any sample w >1% chordate mitochondrial material from raccoon dogs, see bigger table below Image
If you want SARS2 content of all samples ordered by raccoon dog mitochondrial %, see this bigger table:

If you want comparable data for all samples *and* all species, see this even bigger table: github.com/jbloom/Huanan_…
github.com/jbloom/Huanan_…
However, these tables get so big they are difficult to look at.

That is point of interactive scatter plots here:

Choose any species, sampling date, etc & then see SARS2 vs mitochondrial content in one small plot & mouse over points for details.jbloom.github.io/Huanan_market_…
I have posted updated version of preprint on bioRxiv:

This update includes addition of tables mentioned in three Tweets above, plus some revisions and responses to thoughtful comments by @flodebarre @acritschristoph detailed here:
biorxiv.org/content/10.110…
The final peer-reviewed version of my analysis of the environmental samples at the Huanan market is now published in Virus Evolution: academic.oup.com/ve/article/9/2…
I have performed additional new analyses comparing SARSCoV2 to other animal CoVs in environmental samples from Huanan Market

Paper w these new analyses is & full computational pipeline at

The new results are summarized below.doi.org/10.1093/ve/vea…
github.com/jbloom/Huanan_…
I first calculated total reads mapping to SARSCoV2 & other animal CoVs across all samples, & just samples collected on date of wildlife stall sampling (Jan-12-2020)

Six CoVs have >500 reads; of these 4 have many reads from Jan-12-2020 samples, but 2 have few reads from that date Image
Specifically, bamboo rat CoV, two canine CoVs, & rabbit CoV have substantial reads in samples from Jan-12-2020 wildlife-stall sampling

SARSCoV2 & rat CoV have few reads from that date

See for interactive plot w more options jbloom.github.io/Huanan_market_…
Image
I next analyzed reads on per-sample basis. Below I just show results for Jan-12-2020 as that is when most samples w material from potentially susceptible animals (eg, raccoon dogs, bamboo rats) collected. Image
For bamboo rat, canine & rabbit CoVs there were samples w 100s viral reads. Samples w most viral reads had largest frac animal material from known hosts

But all Jan-12-2020 samples had few SARSCoV2 or rat CoV reads, & samples w most reads had little material from plausible hosts Image
To better explore the data in above plot, see for an interactive plot that enables you to mouseover points for details on samples, select individual CoVs to display, and show additional dates.jbloom.github.io/Huanan_market_…
I also plotted viral vs animal genetic content for Jan-12-2020 samples.

For 4 most abundant animal CoVs in these samples there is association of viral & host animal content, but not for much less abundant SARSCoV2 and rat CoV

(Also see interactive plot ) jbloom.github.io/Huanan_market_…
Image
Overall, these results show that genetic material from some animal CoVs is fairly abundant in samples collected during the wildlife-stall sampling of the Huanan Market on Jan-12-2020. However, SARSCoV2 is not one of these CoVs.
For the animal CoVs with high abundance on Jan-12-2020, there are meaningful associations between the content of viral and animal genetic material. But there are not such associations for SARSCoV2 and less abundant viruses like the rat CoV Lucheng-19.
There remain significant caveats related to the underlying available data, as discussed in limitations section at end of my initial paper on this topic () academic.oup.com/ve/article/9/2…
Image
Please see my full new paper at for more details, and interactive plots that allow you to explore the data in additional ways.doi.org/10.1093/ve/vea…
jbloom.github.io/Huanan_market_…
Corrected link to interactive plots: jbloom.github.io/Huanan_market_…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Bloom Lab

Bloom Lab Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @jbloom_lab

Apr 20
In new study led by @bblarsen1 in collab w @veeslerlab @VUMC_Vaccines we map functional & antigenic landscape of Nipah virus receptor binding protein (RBP)


Results elucidate constraints on RBP function & provide insight re protein’s evolutionary potentialbiorxiv.org/content/10.110…
Nipah is bat virus that sporadically infects humans w high (~70%) fatality rate. Has been limited human transmission

Like other paramyxoviruses, Nipah uses two proteins to enter cells: RBP binds receptor & then triggers fusion (F) protein by process that is not fully understood
RBP forms tetramer in which 4 constituent monomers (which are all identical in sequence) adopt 3 distinct conformations

RBP binds to two receptors, EFNB2 & EFNB3

RBP’s affinity for EFNB2 is very high (~0.1 nM, over an order of magnitude higher than SARSCoV2’s affinity for ACE2) Image
Read 12 tweets
Mar 5
Over 4 yrs after being first to publicly release SARS-CoV-2 genome, Yong-Zhen Zhang just published large set of viral seqs from first stage of COVID-19 outbreak in China


He uses data to suggest scenarios re early outbreak & root of viral phylogenetic tree academic.oup.com/ve/advance-art…
Image
Zhang recruited nearly all COVID-19 patients hospitalized at Shanghai Public Health Center in first 2/3 (Jan-Sep) of 2020.

The largest source of Shanghai patients in Jan/Feb 2020 was imported cases from Wuhan or elsewhere in Hubei, thereby providing window into Wuhan outbreak. Image
Overall, Zhang obtained 343 near-full-length SARS-CoV-2 sequences from 226 distinct patients, including 133 sequences from samples collected no later than Feb-15-2020.

A phylogenetic tree showing these sequences is below. Image
Read 11 tweets
Feb 7
In new study led by Caleb Carr & @khdcrawford, we measure how all mutation to Lassa virus glycoprotein complex (GPC) affect cell entry & antibody escape

Results show how prospective assessment of effects of mutations can inform design of countermeasures
biorxiv.org/content/10.110…
As background, Lassa virus causes of thousands of deaths each year, mostly from spillovers from its rodent host, but there is occasional human-to-human transmission.

Lassa is biosafety-level-4 priority pathogen, & efforts are underway to develop vaccines & antibody therapeutics.
We used pseudovirus deep mutational scanning to study effects of nearly all 9,820 amino-acid mutations to Lassa’s GPC at biosafety-level-2 by making genotype-phenotype linked libraries of lentiviral pseudotypes
blog.addgene.org/viral-vectors-…
Image
Read 18 tweets
Jan 17
Here is my brief analysis of Dec-28-2019 SARSCoV2 submission to Genbank.

This analysis supports my conclusion to WSJ () that this submission does not tell origin of virus, but does show sequence known to Chinese Academy of Sciences weeks before released wsj.com/politics/natio…
Image
Here is link to my full analysis:

See also images of the same posted below (although it's probably just easier to click on link above and read HTML). github.com/jbloom/SARS2_2…



Image
Image
Image
Image
I also don't think Genbank/NCBI could have reasonably known at time that this sequence was so valuable given that Chinese govt did not announce they had sequence or had submitted it, and Genbank receives vast numbers of submissions.
Read 4 tweets
Dec 17, 2023
In new study led by Frances Welsh, we map how mutations to influenza affect neutralization by antibodies from humans of various ages

We find differences in mutation effects among age groups

Virus has evolved especially to escape antibodies of teenagers

biorxiv.org/content/10.110…
As background, human influenza constantly evolving. So people exposed to different strains, depending on their age & idiosyncratic history of infection/vaccination.

Different exposure histories cause people to make antibodies w different specificities

rupress.org/jem/article/21…
How does this person-to-person heterogeneity in antibody specificity affect influenza evolution?

That’s question we set out to answer

We used deep mutational scanning to measure how H3N2 HA mutations affect neutralization by serum antibodies from children, teenagers, and adults
Read 11 tweets
Nov 29, 2023
I wanted to highlight this pre-print by David Ho’s group on the neutralizing antibody response to new (XBB.1.5-based) COVID vaccine booster, as it illustrates some points related to paradigm of updating SARS-CoV-2 vaccines to keep pace w viral evolution.
biorxiv.org/content/10.110…
Recall original COVID vaccines worked very well against early SARS-CoV-2 strains

Unfortunately, virus has been evolving, so antibodies elicited by that vaccine don’t neutralize newer viral variants very well

(Other human CoVs also evolve same way: ) journals.plos.org/plospathogens/…
Image
So in fall 2022, new booster was made that mixed new (at time) BA.5 variant & original strain. Hope was to boost neutralization of new variants.

Unfortunately, only sort of worked. Titers did go up, but not a relatively greater increase for new variants. Image
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(