I've just released (during #MicroSeq2021) a new short-read polishing tool for fixing errors in long-read bacterial genome assemblies: Polypolish!
github.com/rrwick/Polypol…
(1/8)
There are many other short-read polishing tools, including HyPo, NextPolish, ntEdit, Pilon and POLCA. So what does Polypolish do differently to warrant another?
(2/8)
Most other polishers use 'normal' short-read alignments, where each read is aligned to one best location (randomly chosen in a tie). This works fine in non-repeat sequences, but errors in repeats often lead to a lack of alignments and therefore can't be fixed.
(3/8)
Polypolish is instead designed to take alignments where each read is aligned to all possible locations, which ensures good coverage in repeats. So Polypolish can often fix errors in repeats that other polishers cannot.
(4/8)
Here are some preliminary results from an in-progress paper. Each dot is a long-read bacterial genome assembly. Polypolish did well, but the best results came from using Polypolish in combination with other short-read polishers.
(5/8)
Lots more information about how it works can be found on the Polypolish wiki:
github.com/rrwick/Polypol…
(6/8)
Also, Polypolish was the first tool I've written in @rustlang. I'm still new to the language, but I like it a lot so far. Hopefully more Rust in my future!
(7/8)
Thanks to @DrKatHolt, everyone in the Holt lab, the organisers of #MicroSeq2021 and all the other developers out there working on genome assembly and polishing 😄
(8/8)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ryan Wick

Ryan Wick Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rrwick

22 Feb
Excited to announce a new preprint! We did a study comparing two different @nanopore library prep approaches (ligation and rapid) for bacterial genomes with small plasmids:
biorxiv.org/content/10.110…
(1/11)
I really like this paper because it has a clear conclusion simple enough to fit in a tweet: rapid preps are better than ligation preps at recovering small plasmids.
(2/11)
Figure 1 gives a simplified illustration of why we think this is the case: due to their size, small circular plasmids can avoid fragmentation during DNA extraction, leaving no ends for adapter ligation. Rapid preps, in contrast, don't depend on DNA ends.
(3/11) Image
Read 11 tweets
22 Sep 20
We've once again updated our paper benchmarking long-read assemblers for bacterial genomes! Take a look at the fresh results here:
f1000research.com/articles/8-2138

Updates since the last version include...
(1/9)
New versions of some assemblers: Canu v2.0, Flye v2.8, Raven v1.1.10 and Shasta v0.5.1. My favourite change here is that Flye no longer requires a genome size parameter.
(2/9)
I've also added a new assembler to the comparison: NextPolish/NextDenovo. It performed well on chromosomes but not on plasmids, and it was more cumbersome to run than the other tools.
(3/9)
Read 9 tweets
28 Jul 20
I'm releasing a new tool today: Trycycler!
github.com/rrwick/Trycycl…

It is for generating a consensus long-read assembly of a bacterial genome.

(1/9) Image
I.e. you give Trycycler multiple different long-read assemblies of the same genome, and it produces a single consensus assembly that is better than any of the inputs.

(2/9)
In doing so, Trycycler can repair most of the problems that hide in long-read assemblies. These include:
1) missing/spurious contigs
2) bad circularisation
3) glitchy sequence regions

(3/9)
Read 9 tweets
23 Apr 20
The first update to my long-read assembler benchmarking paper is up on F1000Research:
f1000research.com/articles/8-2138

Updates include...
(1/8)
The results now include fresh versions of some the assemblers: Flye (v2.6 -> v2.7), Raven (v0.0.5 -> v0.0.8) and Shasta (v0.3.0 -> v0.4.0)
(2/8)
I've also added a new assembler to the comparison: NECAT
github.com/xiaochuanle/NE…
(3/8)
Read 8 tweets
2 Jan 20
New paper for the new year! It compares different long-read assemblers for microbial genome assembly:
f1000research.com/articles/8-213…

Two Twitter threads follow - one about the paper itself and one about my experience with @F1000Research.
(1/n)
In this paper, we did a ton of long-read microbial genome assemblies (using both real and simulated long-read sets) to see how the current assemblers perform.
(2/5)
I won't get into detailed results here, but very briefly: Flye, Raven and Miniasm/Minipolish were our favourites, each excelling in particular ways 🏆
(3/5)
Read 10 tweets
3 Dec 19
I've got a little new tool to share: Minipolish
github.com/rrwick/Minipol…

It does Racon-polishing on a miniasm long-read assembly. Why not just use Racon directly? For a few reasons...

(1/6) Image
1. Minipolish keeps the assembly in graph form (GFA format) whereas Racon produces FASTA sequences.

2. Racon has a nasty habit of sometimes truncating sequences a little bit when it polishes them - Minipolish will repair this.

(2/6)
3. Minipolish 'rotates' circular contigs (like in bacterial genomes) between polishing rounds. This ensures that final polished contigs circularise cleanly (no missing or overlapping bases).

(3/6)
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(