It does Racon-polishing on a miniasm long-read assembly. Why not just use Racon directly? For a few reasons...
(1/6)
1. Minipolish keeps the assembly in graph form (GFA format) whereas Racon produces FASTA sequences.
2. Racon has a nasty habit of sometimes truncating sequences a little bit when it polishes them - Minipolish will repair this.
(2/6)
3. Minipolish 'rotates' circular contigs (like in bacterial genomes) between polishing rounds. This ensures that final polished contigs circularise cleanly (no missing or overlapping bases).
(3/6)
4. Minipolish will add read depth information to the contigs. This can help distinguish important high-depth contigs from other low-depth stuff.
(4/6)
Despite its relative simplicity, I've found miniasm to be a great little assembler that can hold its own against the more robust tools like Flye and Canu. By simplifying the polishing process, I hope Minipolish makes miniasm easier to use!
(5/6)
As always, thank you to all the other bioinformaticians out there for their work! Especially @lh3lh3 for miniasm and the Racon developers (@IvanSovic, @robertvaser and @msikic).
(6/6)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Peer review brought quite a few improvements, so many thanks to the reviewers! My favourite addition is this new supp figure.
(2/6)
It shows that Polypolish was the tool least likely to introduce errors during polishing. It only did so at one place in 100 genomes (panel D) where it changed a 3-bp deletion to a 5-bp deletion in a tandem repeat.
(3/6)
I just released a new version of Unicycler (v0.5.0) which fixes SPAdes compatibility, drops some extraneous bits and patches a few bugs. github.com/rrwick/Unicycl…
Unicycler is now nearly 6 years old, so here's a thread with my thoughts on its place in the world in 2022.
(1/8)
Unicycler is a hybrid (short+long) bacterial genome assembly pipeline that takes a short-read-first approach. I.e. it first makes a short-read assembly graph, then uses the long reads to scaffold the graph to completion.
(2/8)
Short-read-first assembly made a lot of sense when Unicycler was first built in 2016. Back then, Nanopore reads were often shallow and low-quality, so the short-read graph made a good a starting point for assembly.
(3/8)
Our preprint describing Polypolish is now up: biorxiv.org/content/10.110…
Polypolish is a short-read polisher for long-read bacterial genome assemblies. Some highlights from the paper follow in this thread...
(1/12)
There are already quite a few short-read polishers out there: HyPo, NextPolish, ntEdit, Pilon, POLCA, Racon and wtpoa. So why did we add to this collection? It's because they nearly all suffer from the same problem with errors in repeats.
(2/12)
When you align short reads to a long-read genome assembly in the 'normal' one-alignment-per-read manner, you often get no coverage over errors in repeats. This is because reads will preferentially align to other error-free instances of the repeat.
(3/12)
I've just released (during #MicroSeq2021) a new short-read polishing tool for fixing errors in long-read bacterial genome assemblies: Polypolish! github.com/rrwick/Polypol…
(1/8)
There are many other short-read polishing tools, including HyPo, NextPolish, ntEdit, Pilon and POLCA. So what does Polypolish do differently to warrant another?
(2/8)
Most other polishers use 'normal' short-read alignments, where each read is aligned to one best location (randomly chosen in a tie). This works fine in non-repeat sequences, but errors in repeats often lead to a lack of alignments and therefore can't be fixed.
(3/8)
Excited to announce a new preprint! We did a study comparing two different @nanopore library prep approaches (ligation and rapid) for bacterial genomes with small plasmids: biorxiv.org/content/10.110…
(1/11)
I really like this paper because it has a clear conclusion simple enough to fit in a tweet: rapid preps are better than ligation preps at recovering small plasmids.
(2/11)
Figure 1 gives a simplified illustration of why we think this is the case: due to their size, small circular plasmids can avoid fragmentation during DNA extraction, leaving no ends for adapter ligation. Rapid preps, in contrast, don't depend on DNA ends.
(3/11)
We've once again updated our paper benchmarking long-read assemblers for bacterial genomes! Take a look at the fresh results here: f1000research.com/articles/8-2138
Updates since the last version include...
(1/9)
New versions of some assemblers: Canu v2.0, Flye v2.8, Raven v1.1.10 and Shasta v0.5.1. My favourite change here is that Flye no longer requires a genome size parameter.
(2/9)
I've also added a new assembler to the comparison: NextPolish/NextDenovo. It performed well on chromosomes but not on plasmids, and it was more cumbersome to run than the other tools.
(3/9)