After yet another tour through a whole stack of Python workflow systems, I still can't find one that beats @nextflowio for #bioinformatics. Here's a short thread on the fatal flaws of each:
@ApacheAirflow: popular and elegant, but it still has very poor (if any) support for HPC execution, and it has no concept of platform-native file storage (S3 on AWS, local filesystem on HPC etc).
@dask_dev: a lovely minimal API with tight integrations for pandas and numpy, but this comes at the loss of explicit output caching (it may or may not decided to re-run any given task), and file handling.
@Toil_GI is always the first engine I try, and some big improvements have been made lately (like migrating to Python 3). But if you run a workflow to completion and then edit the workflow, it isn't able to cache the successful tasks and must rerun them all.
@dagsterio is a relatively new player, and it comes with a clean declarative API and some neat new features like runtime type checking. Unfortunately it isn't very portable to HPC, and lacks the ability to cache dynamic tasks (e.g. scattering over each line in a file).
#snakemake relies heavily on the file-dependency idiom from make, which I have never found to suit my workflows. It also makes writing workflows very unintuitive (having to reason backwards from the goal), and dynamic scatter/gather is possible but very complicated.
Also while I'm here, the reason I like @nextflowio so much is that it ticks these boxes: supports tasks that produce files but also values, portable to HPC and cloud, backed by a real programming language you can import from, caches every task, and doesn't require a static DAG.
A few more I've looked at just now: @raydistributed actually does seem to have HPC support which is rare for newer engines, but sadly it doesn't have any mechanism for caching successful tasks between re-runs of a workflow.
@MetaflowOSS has a long running issue with storing files at all, which is vital for bioinformatics: github.com/Netflix/metafl…. It has built-in support for S3, but if you want any other kind of storage you're out of luck.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Michael Milton

Michael Milton Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(