Profile picture
, 9 tweets, 2 min read Read on Twitter
I have a little tool to share, in case it's useful to anybody else:
github.com/rrwick/Assembl…

Nothing too fancy, but if you work with large numbers of bacterial genomes, read on... (1/9)
I occasionally encounter a situation where I have lots of genome assemblies, but some are quite redundant. Often because the set contains outbreaks of near-identical genomes. (2/9)
For example, if you download all Klebsiella pneumoniae genomes, you'll get many thousands, but common disease-causing lineages (like CG258) are heavily represented. (3/9)
If you want to analyse the diversity in your genome assemblies (e.g. building a pan genome), this redundancy can be a pain. Near-identical genomes don't add much diversity but can make analyses slow. (4/9)
So in these cases, I usually want to reduce the assemblies to a unique subset. Using Klebsiella pneumoniae as an example again, I might want to reduce my set to a few hundred that are all quite distinct. (5/9)
That's where this tool comes in! It takes a directory of assemblies as input and clusters them, copying one assembly per cluster to an output directory. (6/9)
The representative for each cluster is chosen based on N50, so completed genomes are preferred over draft genomes. And the clustering threshold is of course adjustable, so you can get more or fewer output assemblies. (7/9)
It uses Mash (which is very fast, thanks @BrianOndov and @aphillippy!) to get the distances between assemblies, so it works on large input sets (10s of thousands of assemblies are fine if you have a decent amount of RAM). (8/9)
That's all! It's not a particularly sophisticated tool, but I hope it might be handy for someone out there. (9/9)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Ryan Wick
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!