Explore Earth’s Virome with Serratus. We uncover 130,000+ new species of RNA viruses, increasing the total known species by an order of magnitude. (Thread)
Serratus is an open-source cloud-compute based alignment infrastructure. With it, we aligned 5.7 million sequencing libraries (10.2 petabases) against a collection of all viral RNA-dependent RNA Polymerases to rapidly expand the limits of known RNA viruses.
The entire search was completed in 11-days wall-clock. All software is open-source and the data is freely available. Explore the Open Virome at serratus.io
In total we assembled aligned-hits from 3M libraries to yield 800,000 RdRP contigs which clustered into 130,000 novel species-like OTU (90% amino acid identity).
To distinguish so many viruses we developed a structurally informed “barcode sequence” encompassing the catalytic core of RdRP we call the “Palmprint”. It’s like 16S sequencing, but for RNA viruses.
In a companion manuscript, the Palmscan algorithm is explored for extracting palmprints: biorxiv.org/content/10.110….
@akorobeynikov and @meleshko_da pushed the limits de novo assembly with 'coronaSPAdes', allowing for synteny-informed graph traversing. This greatly improves virus assembly, learn more in the lovely companion piece: biorxiv.org/content/10.110…
With a state-of-the-art assembler, we identify a clade of corona-like viruses related to PsNV (shoutout @gidmord) and show that these viruses are likely encoded on segmented genomes.
@Marcos_dlP tirelessly recovered a cool 54 novel Deltavirus / Delta-like viruses. Then stumbled on 300+ truly enigmatic “Zeta Viruses”, blurring the lines of the early evolutionary origins of deltaviruses.
@bbuchfink, author of the amazing DIAMOND software for translated-nucleotide search (github.com/bbuchfink/diam…), specifically tweaked it for the Serratus use-case, dropping CPU-runtime by >10x.
Serratus is not only optimized for analyzing the data available today, but also to keep up with the exponential growth of sequencing for tomorrow.
Serratus is a collaborative #OpenScience project started at the @hackseq hackathon. If you’re a scientist/dev interested in doing cool research, join our Slack! All hands welcome. join.slack.com/t/hackseq-rna/…
Special thanks to @awscloud and @UBC #CIC for project support and the SRA Team @NIHDataScience! <3

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Rayan Chikhi

Rayan Chikhi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!