The Antartica metagenomic samples that @jbloom_lab nicely covered has some quirks.
NCBI claims its Illumina Data
Fastq files have headers that look like MGISeq data.
So I decided to take a look at their Adaptor sequences as each sequencer has their own unique flow cell primers
TrimGalore indeed confirms these are MGISEQ reads.
This paper has the MGISEQ adaptor sequences- frontiersin.org/articles/10.33…
Why does this matter? The authors posit that this could be a result of the high index hopping problem seen on Illumina platforms.
MGISEQs documented index hopping rate is orders of magnitude lower than Illumina. bmcgenomics.biomedcentral.com/articles/10.11…
This implies the contamination would have to occur prior to index ligation and the SARs construct would mostly likely have to be DNA not RNA for a DNA based metagenomic library to capture it. Has anyone looked for vector sequences in the data?
Why do we care about vector sequence? That implies human manipulation in dec 2019.
If index hopping is ruled out,
Then the contamination must be earlier and RNA molecules eventually must be turned into DNA for metagenomic libraries to capture them.
Someone had it as cDNA/Vector
I have send the authors this thread. Their finding of cells line DNA in the metagenomic DNA is also suggestive of human manipulation prior to December 2019.
After using the MGISEQ adaptors for trimming we get more Forward reads to map but still 2X more reverses. The MGISEQ has more signal on the reverse reads as there is an additional polymerase replication event.
However the reads look very noisy on their 3' ends. Sign of dim DNBs
Dim DNBs usually have lower quality and are more prone to index hopping from neighboring bright DNBs.
This is quality of the Reverse reads that mapped and all Reverse reads. There is a 10Q difference. Q20 reads have 1 error every 100bp. Q30 reads have 1 error every 1,000 bases. That 10 fold drop in quality means Index hopping may still be on the table. We 6K/55M reads mapping.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Many people ask me about this Moderna patent sequence.
Some calc the odds of a 19mer by chance as 4^19.
A big number if life were truly random.
But evolution is a preservation of those random words that improve fitness so we have to ask, are there similarities to common words?
Take the 19mer sequence and plug it into NCBI BLASTn against the Nr database.
Check ‘exclude’ and enter coronaviridae.
You’ll get microbial hits like this.
Check the E-Value.
What’s the E-Value?
Q: What is the Expect (E) value?
The Expect value (E) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases.
I believe this is because the implications of FPs are less severe than the implications of FNs.
FPs you can confirm with another test or just suck it up and quarantine for the ‘greater good’.
FNs, on the other hand, expose the entire track trace system as the scam that it is.
Particularly when they are this high.
Once negative, very few people want to stand in a line with other potentially sick people and pay $50-300 a second shot at quarantine.
We have Cannabis Whole Genome Sequencing honed to the point where people are using it to untangle the history of the famous Skunk #1 line.
I wasnt around then so I cant speak with any authority on the oral history but I can help people better understand Kannapedia.net
Phylo-Trees can be complicated so lets just take a look at the genetics of THCAS.
There are a few interesting mutations found in early cannabis lines that we will go over.
Ala250Asp
Pro333Arg
Pro542Leu
Ser355Asn
A recently sequenced Skunk line kannapedia.net/strains/rsp124…
A250D and Pro333Arg are some of the most common mutations.
A250D is found in 12.7% of the NGS data. The C90 data finds this more frequently but less samples have been run through that pipeline.
P333R is found in 18.2% of the NGS data.
Respiratory viruses have been around longer than any Judge. To assume parasitism and not mutualism is myopic.
45% of our genome consist of viral elements (LINE, Alu etc) and 8% of the genome consists of infectious retroviruses.
The advent of NextGen sequencing allowed us to peer into mutualistic viruses, as in the past, expensive sequencing was reserved for exploring pathogens.