Explore Earth’s Virome with Serratus. We uncover 130,000+ new species of RNA viruses, increasing the total known species by an order of magnitude. (Thread)
Serratus is an open-source cloud-compute based alignment infrastructure. With it, we aligned 5.7 million sequencing libraries (10.2 petabases) against a collection of all viral RNA-dependent RNA Polymerases to rapidly expand the limits of known RNA viruses.
The entire search was completed in 11-days wall-clock. All software is open-source and the data is freely available. Explore the Open Virome at serratus.io
In total we assembled aligned-hits from 3M libraries to yield 800,000 RdRP contigs which clustered into 130,000 novel species-like OTU (90% amino acid identity).
To distinguish so many viruses we developed a structurally informed “barcode sequence” encompassing the catalytic core of RdRP we call the “Palmprint”. It’s like 16S sequencing, but for RNA viruses.
In a companion manuscript, the Palmscan algorithm is explored for extracting palmprints: biorxiv.org/content/10.110….
@akorobeynikov and @meleshko_da pushed the limits de novo assembly with 'coronaSPAdes', allowing for synteny-informed graph traversing. This greatly improves virus assembly, learn more in the lovely companion piece: biorxiv.org/content/10.110…
With a state-of-the-art assembler, we identify a clade of corona-like viruses related to PsNV (shoutout @gidmord) and show that these viruses are likely encoded on segmented genomes.
@Marcos_dlP tirelessly recovered a cool 54 novel Deltavirus / Delta-like viruses. Then stumbled on 300+ truly enigmatic “Zeta Viruses”, blurring the lines of the early evolutionary origins of deltaviruses.
@bbuchfink, author of the amazing DIAMOND software for translated-nucleotide search (github.com/bbuchfink/diam…), specifically tweaked it for the Serratus use-case, dropping CPU-runtime by >10x.
Serratus is not only optimized for analyzing the data available today, but also to keep up with the exponential growth of sequencing for tomorrow.
Serratus is a collaborative #OpenScience project started at the @hackseq hackathon. If you’re a scientist/dev interested in doing cool research, join our Slack! All hands welcome. join.slack.com/t/hackseq-rna/…