Tweet

hackseq / Artem Babaian

Jan 26 • 13 tweets • 11 min read

@Nature

Serratus is now published in @Nature :) nature.com/articles/s4158…

We searched 5.7M seq libraries (10.2 petabases) for all 15,000 known RNA viruses. In 11 days, we uncovered 130,000+ new RNA viruses (incl 9 new CoV, with a twist). That’s near an order of magnitude bump.
[1/N] 🧵👇

@rayanchikhi

[2] For the Scientific Conclusions, @rayanchikhi has a great thread from the preprint:
👉

https://twitter.com/rayanchikhi/status/1371500685502599170

👈

[3] As the pandemic hit, like many scientists we wanted to help. The idea was simple: analyze all public sequencing data to ensure every possible Coronavirus sequence ever sampled is identified and freely available. And do it fast.

(aka Eye of SRAn)

@NIHDataScience

[4] By luck, @NIHDataScience STRIDES had just finished mirroring the massive Sequence Read Archive (SRA) to cloud platforms. An opportunity!

See their recent update paper! pubmed.ncbi.nlm.nih.gov/34850094/

@awscloud

[5] The world’s DNA/RNA sequencing was at our fingertips as an Open Dataset on @awscloud. Accessing 20 million gigabytes of sequencing data was no longer a bottleneck, we eventually did this in under 11 days.

Take a look under the hood:

[6] 🌍Computationally efficient access to planetary-scale sequencing data will forever change genetics🌎
#Bioinformatics #OpenScience #BigData #GannaNeedABiggerPipe

[7] The coolest part of open-source projects is teaming up with awesome devs who improve their tools too; We got a tailored v. of SPAdes: coronaSPAdes (protip you can use it for any RNA virus); and a sig. boost in small-query alignment for DIAMOND v2. Stay tuned for MUSCLE v5!

[8] Links:

coronaSPAdes: academic.oup.com/bioinformatics…

DIAMOND v2: nature.com/articles/s4159…

MUSCLE5: biorxiv.org/content/10.110…

@RNASociety

[9] Serratus is a volunteer project. We started out at the #hacksqRNA hackathon (ty: @RNASociety’ / @UBC MedGen) and continue to have an open-door collaboration policy (cough*you should join*)

#hackathon #openscience

@EUvsVirus

[10] We took part in COVID19 #bioHackathon, @EUvsVirus, @hackzurich, @redhat Team19, sent out tweets, emailed bioinformaticians and virologists. Eventually we got an amazing and passionate crew together. <3 <3

@UBC

[11] Huge thanks to the long list of people who took the time to discuss, share insights or just popped in for a few days to help. And to the team at @UBC #CIC and AWS who helped make this possible.

And of course, what matters most is the friends we made along the way…

@robertedgarphd

[12] Serratus: @robertedgarphd Jeff Taylor @tomeraltman @akorobeynikov @meleshko_da @PierreBarbera @themicrobeguy @banfieldlab @NovakovskyG @victorlin @danlohrdev @bbuchfink @Marcos_dlP @rayanchikhi, and a borrowed @hackseq

[13] All Serratus data is free and public (cc0) immediately. Our goal is to catalyze research into Earth’s virome as intuitively as possible. Reach out if any help is needed :)

Data Explorer: serratus.io | Experimental RdRP interface: serratus.io/palmid

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

hackseq / Artem Babaian

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?