I’ve refrained from commenting on metagenomics of environmental samples from Huanan Seafood Market, since media coverage preceded description of data or analysis.
Data still not available, but there is now enough written down to offer partially informed assessment.
These data don’t tell us how pandemic began, but every bit of information helps.
Here is what I glean from the available analyses by the two groups w access.
First analysis is pre-print Chinese CDC posted over year ago (Feb-25-2022) that describes sampling animals & environment at Huanan Market: researchsquare.com/article/rs-137…
It is moving thru peer review at glacial pace: @researchsquare status says still under review at a Nature journal.
Below is my summary of Chinese CDC preprint when it posted last year:
Human infections started in Wuhan no later than Nov 2019, which limits how much can be concluded from either positive or negative samples collected on Jan 1.
Chinese CDC collected samples throughout market, but focused on southwest corner where animals sold. Because sampling focused on southwest corner, both negative (green) and positive (orange) environmental samples were mostly from that part of market.
Chinese CDC reports testing samples for SARS2 by PCR & deep sequencing them. They report no tendency for the rate of PCR positive samples to be associated with any type of market vendor: similar positivity rates for stalls selling wildlife, vegetables, livestock, seafood, etc.
Table 1 summarizes environmental sample testing.
I’ve added highlighting & raccoon dog image next to sample Q61.
This sample was collected on Jan-12 & was negative for SARS2 by PCR but positive by NGS, which presumably means it had detectable SARS2 reads in deep sequencing.
Does fact Q61 was negative for SARS2 by PCR testing mean it had less viral RNA than samples w measurable Ct values? Unclear, because neither Chinese CDC pre-print nor subsequent analysis discussed below quantify fraction of deep sequencing reads that map to SARS2 for all samples.
This brings us to the second analysis by Crits-Christoph et al which recently posted on Zenodo. This pre-print analyzes deep sequencing data for some samples the authors obtained via GISAID, although those data remain unavailable to others. zenodo.org/record/7754299…
The way Crits-Christoph et al obtained these data has itself become a controversy. I am not going to address that. Read the first 3 pages of Crits-Christoph report linked above & GISAID statement (gisaid.org/statements-cla…) for opposing viewpoints.
Instead, I’ll focus on the substance of the Crits-Christoph analysis. Their analysis focused on the non-SARS2 metagenomic content of the deep sequenced environmental samples.
Key results are in Fig 1 of their analysis, shown below.
Contour heatmap shows fraction of positive samples from different regions of market.
Positive samples mostly from southwest corner, since that is where Chinese CDC mostly collected samples.
So contour plot just reflects where Chinese CDC collected samples (as shown in Fig 1A of their pre-print, mentioned earlier in this thread).
In other words, contour does not show percent positivity of collected samples, but total positive samples not normalized by total samples.
The novel part of Fig 1 of Crits-Christoph et al is fraction of mammalian mtDNA reads in each analyzed sample that mapped to variety of mammalian species. These are pie chart circles.
There are samples w mtDNA from humans, pigs, sheep, cows, Siberian weasels, raccoon dogs, etc.
The sample that has attracted the most attention is the mostly light green pie circle, for which the dominant mtDNA is from a raccoon dog. This has attracted attention because raccoon dogs are one of several species known to be susceptible to SARS2.
That green pie circle is sample Q61 from Chinese CDC pre-print, which was SARS2 negative by PCR but positive by deep sequencing. When data become available, I would like to quantify SARS2 reads in each sample, as it would be useful to know how much SARS2 was in each sample.
For instance, it is unclear whether there is enough SARS2 in the raccoon dog mtDNA dominated Q61 sample to actually infer a viral sequence, or if the undetectable Ct value for that sample means there are just trace levels of reads that are too low to obtain a viral sequence.
There are also lots of pie circles where the dominant mtDNA is from humans, & also from animals that were probably only sold as meat (eg, cow, sheep, pig). And although not shown in figure (since it’s not a mammal), the text reports some samples had fish mtDNA.
So as Crits-Christoph et al note in text, fact that mtDNA from a species is found in a sample doesn’t mean that species was infected w SARS2. It just means material from that animal ended up in the same place as viral RNA, which was widespread in the Huanan Market by Jan 2020.
They also analyze a few samples in more detail, including assembling near-complete mtDNA for some animals, and some genomic / cDNA contigs for sample Q61. This could be useful for learning more details about animals from which genetic material in the market was derived.
Overall, main thing we learn is details of which animals or products (eg, meats) were in market before it closed on Jan-1-2020.
This doesn’t tell us if any infected w SARS2. But knowing more about animals could help trace supply chain, which is valuable line of investigation.
However, human SARS2 infections started in Wuhan no later than Nov 2019. So we have to be circumspect about environmental samples from Jan 2020.
Eg, raccoon dog sample Q61 was collected on Jan-12-2020, which is at least 6 and probably >8 weeks after first human infections.
This brings me back to main caveat I noted about Chinese CDC pre-print when it posted last year: we are unlikely to get conclusive answers about origin of an outbreak that started in Nov 2019 (or earlier) by looking at samples collected in Jan 2020:
Analyses of Jan 2020 samples is definitely worthwhile, because as @DrTedros rightly noted each piece of data is important for better understanding initial outbreak in Wuhan.
But to identify actual origin of outbreak, we need details from Nov or early Dec 2019.
Recall that there are descriptions of confirmed cases, both unlinked and linked to the market, dating back to Nov and very beginning of Dec 2019:
Over 4 yrs after being first to publicly release SARS-CoV-2 genome, Yong-Zhen Zhang just published large set of viral seqs from first stage of COVID-19 outbreak in China
Zhang recruited nearly all COVID-19 patients hospitalized at Shanghai Public Health Center in first 2/3 (Jan-Sep) of 2020.
The largest source of Shanghai patients in Jan/Feb 2020 was imported cases from Wuhan or elsewhere in Hubei, thereby providing window into Wuhan outbreak.
Overall, Zhang obtained 343 near-full-length SARS-CoV-2 sequences from 226 distinct patients, including 133 sequences from samples collected no later than Feb-15-2020.
A phylogenetic tree showing these sequences is below.
In new study led by Caleb Carr & @khdcrawford, we measure how all mutation to Lassa virus glycoprotein complex (GPC) affect cell entry & antibody escape
Results show how prospective assessment of effects of mutations can inform design of countermeasures biorxiv.org/content/10.110…
As background, Lassa virus causes of thousands of deaths each year, mostly from spillovers from its rodent host, but there is occasional human-to-human transmission.
Lassa is biosafety-level-4 priority pathogen, & efforts are underway to develop vaccines & antibody therapeutics.
We used pseudovirus deep mutational scanning to study effects of nearly all 9,820 amino-acid mutations to Lassa’s GPC at biosafety-level-2 by making genotype-phenotype linked libraries of lentiviral pseudotypes blog.addgene.org/viral-vectors-…
Here is my brief analysis of Dec-28-2019 SARSCoV2 submission to Genbank.
This analysis supports my conclusion to WSJ () that this submission does not tell origin of virus, but does show sequence known to Chinese Academy of Sciences weeks before released wsj.com/politics/natio…
Here is link to my full analysis:
See also images of the same posted below (although it's probably just easier to click on link above and read HTML). github.com/jbloom/SARS2_2…
I also don't think Genbank/NCBI could have reasonably known at time that this sequence was so valuable given that Chinese govt did not announce they had sequence or had submitted it, and Genbank receives vast numbers of submissions.
As background, human influenza constantly evolving. So people exposed to different strains, depending on their age & idiosyncratic history of infection/vaccination.
Different exposure histories cause people to make antibodies w different specificities
I wanted to highlight this pre-print by David Ho’s group on the neutralizing antibody response to new (XBB.1.5-based) COVID vaccine booster, as it illustrates some points related to paradigm of updating SARS-CoV-2 vaccines to keep pace w viral evolution. biorxiv.org/content/10.110…
Recall original COVID vaccines worked very well against early SARS-CoV-2 strains
Unfortunately, virus has been evolving, so antibodies elicited by that vaccine don’t neutralize newer viral variants very well
We made spike pseudovirus deep mutational scanning libraries of mutations across XBB.1.5 spike (as well as XBB.1.5 RBD and BA.2 spike libraries).
We used these libraries to measure how >9,000 mutations affected cell entry, ACE2 binding, and serum antibody escape.
To measure how mutations affected ACE2 binding, we leveraged approach previously used by David Ho & @yunlong_cao groups that is based on fact neutralization of spike-mediated entry by soluble ACE2 is proportional to ACE2 binding.