Discover and read the best of Twitter Threads about #Snakemake

Most recents (9)

🧵🧵 THREAD: #Bioinformatics applications - a deep dive into #pipelines and #workflows 🧵🧵
1. #Bioinformatics is the application of #computational techniques to the #analysis of #biological #data, such as #sequences, #structures, and #interactions, and is an essential field of modern biology and medicine.
2. #Bioinformatics has many exciting applications in various fields, such as #genetics, #genomics, #proteomics, and #metabolomics, that can provide new insights into the workings of living systems, and can help to advance science and society.
Read 8 tweets
Just released another #Python package for #bioinformatics: #Cython bindings to FAMSA, a very fast algorithm for #MultipleSequenceAlignment by @sdeorowicz, Agnieszka Debudaj-Grabysz and @AdamGudys! Get it here: github.com/althonos/pyfam…
Just create an Aligner object (with additional configuration if you want), then give it some sequences to align:
It's missing some small things that I'll add over time, but for now the key features are there, and you could use it in your Python or #snakemake workflows in place of the FAMSA binary.
Read 4 tweets
Una de las principales tareas en el tratamiento de datos en #bioinformática 🧬💾 es la encadenación de procesos para construir lo que se denomina #WorkFlow ⚙️ . A menudo estos procesos suelen dar error, lo que en muchas ocasiones nos obliga a reiniciar todo el trabajo. (1/9)
Con el fin de automatizar estas tareas y desarrollarlas de la forma más eficiente existen los gestores de trabajo, que nos permiten agilizar y personalizar los análisis. Aquí os dejo los más representativos (2/9) 🧶👇:
#Bpipe una plataforma para ejecutar grandes trabajos de #bioinformática. Actualizado en septiembre del 2021. Además, posee algunos ejemplos de procesos de análisis estandarizados en bioinformática que nos serán muy útiles (3/9) 👇:
docs.bpipe.org
Read 9 tweets
After yet another tour through a whole stack of Python workflow systems, I still can't find one that beats @nextflowio for #bioinformatics. Here's a short thread on the fatal flaws of each:
@ApacheAirflow: popular and elegant, but it still has very poor (if any) support for HPC execution, and it has no concept of platform-native file storage (S3 on AWS, local filesystem on HPC etc).
@dask_dev: a lovely minimal API with tight integrations for pandas and numpy, but this comes at the loss of explicit output caching (it may or may not decided to re-run any given task), and file handling.
Read 9 tweets
Germany (and other countries heavily relying on fossil energy imports today) need to switch to alternative energy carriers in the future to become climate neutral.

The list of options is long:
⚗️Which energy carrier?
🗺️From where?
🚢By which transport mode?
Highlights below.

You can find our preprint (not yet peer reviewed!) on arxiv.org/abs/2107.01092 . With @Michael_Dueren @nworbmot @jlugiessen @TUBerlin .
We modelled 9 energy supply chains (ESCs) from 8 different exporting countries for optimal investment.

Including
* production side (GIS potential analysis, synthetic hourly RES time-series),
* considered local electricity demand before export
* conversion steps along each ESCs World map indicating countr...
Read 12 tweets
Providing just a single least-cost solution underplays an immense degree of freedom when planning future energy systems.

There are many near-optimal alternatives with attractive properties like social acceptance due to less onshore wind capacity or limited grid reinforcement.
Highlights below, or read full paper at doi.org/10.1016/j.epsr… or last year's preprint arxiv.org/abs/1910.01891.

With @nworbmot @KITKarlsruhe @Helmholtz

Kudos to the pioneers @jfdecarolis and @etrutnevyte!
We systematically explored the decision space of a European power system model based on wind and solar that co-optimises generation, storage and grid infrastructure.

We look at how the capacities of each technology can deviate if the costs are epsilon % away from the optimum.
Read 14 tweets
A colleague asked me for some advice about 16S rDNA sequencing... So here is what I've learned during my PhD even when I worked more with Metagenomics than with amplicon sequencing. Constructive additions are welcome.
Traditionally, 16S amplicon sequences were clustered at 97% to create operational taxonomic units (OTUs). A unit that corresponds more or less to species, so we thought. academic.oup.com/bioinformatics…
Now, everybody uses @bejcal's dada2 or equivalent tools to get ribosomal sequence variants (essentially OTUs at 100%) while controlling for sequencing error. Keep in mind this is not the same as full-length 16S clustered at 100%.
Read 9 tweets
Continental or local? 100% renewable electricity supply in Europe can be both. But cost and especially required infrastructure differ.

Read our new @Joule_CP study: doi.org/10.1016/j.joul….

With @JLilliestam @SteMarelli @stefpf @IASS_Potsdam @ETH.

HIGHLIGHTS BELOW👇
Larger is certainly cheaper: continental supply, which uses only the best resources all over Europe, leads to lowest cost. But it concentrates generation infrastructure (💨and☀️), requires a powerful transmission grid, and creates import dependencies. Image
Local supply in ~500 European regions distributes generation infrastructure equally and requires less than half the transmission capacity but 50% more generation capacity in total. This is because 💨 and ☀️ have varying quality in the regions.
Read 8 tweets
A true gem among #multiomics preprints: Integrative Network Fusion by @MarcoChierici, @nicole_bussola, @viperale, et al:

✓ 3 TCGA cancers & simulated data
✓ cross-validation described in detail
✓ flow diagram
✓ source code & data shared
✓ packages w/ version, cited

/n
- [the method description & comments follows]
- link: biorxiv.org/content/10.110…
- licence the above figures/tables: CC BY-NC-ND 4.0
- an earlier version of INF was previously presented in 2018: doi.org/10.1186/s13062…
- this is the first tweet in #SundayMultiOmics series
[[Introduction]]: Similarity network fusion (SNF, doi.org/10.1038/nmeth.…) is a popular technique (600+ citations, a lot for multi-omics!) for getting a sort of consensus signal from multiple omics; it requires the same patients (less commonly - observations) in each omic.
Read 24 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!