New paper for the new year! It compares different long-read assemblers for microbial genome assembly: f1000research.com/articles/8-213…
Two Twitter threads follow - one about the paper itself and one about my experience with @F1000Research.
(1/n)
In this paper, we did a ton of long-read microbial genome assemblies (using both real and simulated long-read sets) to see how the current assemblers perform.
(2/5)
I won't get into detailed results here, but very briefly: Flye, Raven and Miniasm/Minipolish were our favourites, each excelling in particular ways 🏆
(3/5)
Now for the exciting part: this paper will be a living document! I.e. as new versions or new assemblers are released, we will run our tests again and update the paper.
(4/5)
I hope this will make this paper relevant for a longer period of time. And I hope a bit of friendly competition helps spur continued development of the assemblers 😀
(5/5)
@F1000Research This was my first paper with @F1000Research, and the whole process was very positive! They were incredibly responsive, e.g. they did the initial round of copy edits less than 24 hours after we submitted.
(2/6)
It was also my first experience with @overleaf, which F1000Research uses for LaTeX-formatted papers. I'm not great with LaTeX, but Overleaf was pretty easy to use.
(3/6)
F1000Research were very keen to work with us on the whole living document format - allowing for new versions of the paper to come out with updated results. Very grateful that they are willing to be trailblazers in this regard.
(4/6)
I also like that they host the entire article's lifespan, both pre- and post-peer review. It meant I didn't have to manage two different formats (the bioRxiv version and the journal version) like I have with previous papers.
(5/6)
There's still plenty left to do (including the entire peer review process), but so far I'm a big fan of F1000Research. Thanks to everyone there for their great work!
(6/6)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Peer review brought quite a few improvements, so many thanks to the reviewers! My favourite addition is this new supp figure.
(2/6)
It shows that Polypolish was the tool least likely to introduce errors during polishing. It only did so at one place in 100 genomes (panel D) where it changed a 3-bp deletion to a 5-bp deletion in a tandem repeat.
(3/6)
I just released a new version of Unicycler (v0.5.0) which fixes SPAdes compatibility, drops some extraneous bits and patches a few bugs. github.com/rrwick/Unicycl…
Unicycler is now nearly 6 years old, so here's a thread with my thoughts on its place in the world in 2022.
(1/8)
Unicycler is a hybrid (short+long) bacterial genome assembly pipeline that takes a short-read-first approach. I.e. it first makes a short-read assembly graph, then uses the long reads to scaffold the graph to completion.
(2/8)
Short-read-first assembly made a lot of sense when Unicycler was first built in 2016. Back then, Nanopore reads were often shallow and low-quality, so the short-read graph made a good a starting point for assembly.
(3/8)
Our preprint describing Polypolish is now up: biorxiv.org/content/10.110…
Polypolish is a short-read polisher for long-read bacterial genome assemblies. Some highlights from the paper follow in this thread...
(1/12)
There are already quite a few short-read polishers out there: HyPo, NextPolish, ntEdit, Pilon, POLCA, Racon and wtpoa. So why did we add to this collection? It's because they nearly all suffer from the same problem with errors in repeats.
(2/12)
When you align short reads to a long-read genome assembly in the 'normal' one-alignment-per-read manner, you often get no coverage over errors in repeats. This is because reads will preferentially align to other error-free instances of the repeat.
(3/12)
I've just released (during #MicroSeq2021) a new short-read polishing tool for fixing errors in long-read bacterial genome assemblies: Polypolish! github.com/rrwick/Polypol…
(1/8)
There are many other short-read polishing tools, including HyPo, NextPolish, ntEdit, Pilon and POLCA. So what does Polypolish do differently to warrant another?
(2/8)
Most other polishers use 'normal' short-read alignments, where each read is aligned to one best location (randomly chosen in a tie). This works fine in non-repeat sequences, but errors in repeats often lead to a lack of alignments and therefore can't be fixed.
(3/8)
Excited to announce a new preprint! We did a study comparing two different @nanopore library prep approaches (ligation and rapid) for bacterial genomes with small plasmids: biorxiv.org/content/10.110…
(1/11)
I really like this paper because it has a clear conclusion simple enough to fit in a tweet: rapid preps are better than ligation preps at recovering small plasmids.
(2/11)
Figure 1 gives a simplified illustration of why we think this is the case: due to their size, small circular plasmids can avoid fragmentation during DNA extraction, leaving no ends for adapter ligation. Rapid preps, in contrast, don't depend on DNA ends.
(3/11)
We've once again updated our paper benchmarking long-read assemblers for bacterial genomes! Take a look at the fresh results here: f1000research.com/articles/8-2138
Updates since the last version include...
(1/9)
New versions of some assemblers: Canu v2.0, Flye v2.8, Raven v1.1.10 and Shasta v0.5.1. My favourite change here is that Flye no longer requires a genome size parameter.
(2/9)
I've also added a new assembler to the comparison: NextPolish/NextDenovo. It performed well on chromosomes but not on plasmids, and it was more cumbersome to run than the other tools.
(3/9)