After a few quiet months, the next release of #PyHMMER is out today with some new major features!
You can now use a target database for search methods without having to load it entirely in memory, just pass a SequenceFile object directly (s/o @felixmlanger):
Also, there's now an `hmmscan` implementation! Target profiles can be either loaded from a pressed file (like the original), or pre-fetched into memory, which gives massive performance improvements if you can afford the extra RAM. A benchmark of 4,489 proteins against Pfam v33.1:
The API for filtering hits and domains was also improved, with a shortcut for iterating only on hits and domains under inclusion or report thresholds (s/o @satriaphd):
I added a new guide to the documentation (pyhmmer.readthedocs.io/en/stable/exam…) showing how to manage HMM files inside a Python package (for instance for using custom HMMs inside a CLI tool), and some additional performance tips for achieving top performance.
This release also has several bugfixes (small memory leaks in `nhmmer`, potential segfaults in `hmmsearch`, etc.) and some changes in the lower-level interface to make the API more Pythonic overall. Full release notes on GitHub: github.com/althonos/pyhmm…
Finally, I am working on the manuscript for submitting an application note. If my main projects don't take too much time, i hope to submit in the next couple of months!
... (forgot the mandatory #bioinformatics and #Python tags, how are the auto-retweet bots gonna find out) ...
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Just create an Aligner object (with additional configuration if you want), then give it some sequences to align:
It's missing some small things that I'll add over time, but for now the key features are there, and you could use it in your Python or #snakemake workflows in place of the FAMSA binary.