Michael Hanke Profile picture
Oct 16 25 tweets 11 min read
The #FAIR4RS principles (for research software) have been published, and frankly, the paper is a disappointment. Not because of what is in it, but because of what is not.

A monster thread... 1/
nature.com/articles/s4159…
The FAIR4RS working group had >200 members that implemented a community process to distill these principles. Yet the paper fails to even hint at efforts already ongoing for a long time when this process started, and fails to illustrate HOW to actually do it in practice. 2/
"[..] FAIR4RS WG’s very high levels of success in community engagement is because it brought together a range of efforts to apply aspects of FAIR to research software since 2017"

When I started contributing to @debian in 2005, I learned that much of this was already a thing. 3/
Now you might say, but they mention "package management systems" in parenthesis once. But I think it is worth an illustration of how much of an understatement this is.

Let's take the Comet software, that is used as an example for illustrating the principles in the paper 4/
Research software like Comet is being integrated by efforts like debian.org/devel/debian-m… or wiki.debian.org/DebianScience since at least 2004. Comet it self was integrated by volunteers in 2014.

metadata.ftp-master.debian.org/changelogs//ma… 5/
But what does system "integration" mean exactly? It refers to the process of making individual (research) software less of a unicorn, and become an interoperable component in a larger software system. In the case of @debian this is a component in a complete operating system. 6/
This has key advantages. It exposes software to conditions that the original authors might not ever have thought about: different compilers, hardware architectures, major library transitions. It starts a process that leads to more robust, more correct software over time 7/
But first and foremost, it brings standardization. And with it comes findability, accessibility, interoperability, reusability #FAIR, and longevity!

Let's look at the example of Comet and what @debian provides to make it a better fit for #FAIR4RS. 8/
@debian maintains machine-readable copyright and license information for entire software packages and the individual code pieces they are built from, all based on independent audits: metadata.ftp-master.debian.org/changelogs//ma…

Standard specification at debian.org/doc/packaging-… 9/
@debian maintains machine-queryable category metadata for software packages (wiki.debian.org/Debtags) to aid software discovery.

Example search for "spectrometry" software in @debian
debtags.debian.org/search/?wl=&q=… 10/
@debian maintains availability of source and binary form of all(!) historical software versions (since 2010). This can be used to retroactively rebuild historical software environment, e.g. to test or exercise reproducibility.

snapshot.debian.org/binary/comet-m… 11/
@debian offers automatic and routine verification of the reproducibility of *building* (research) software from source. This is a crucial component of reproducible research, where and when source code
is considered the true origin of RS:
tests.reproducible-builds.org/debian/rb-pkg/… 12/
This wealth of information, provided to the (scientific) community perpetually and for free, is combined with metadata provided by "upstream" projects (wiki.debian.org/UpstreamMetada…), and is indexed in databases for various services to query (wiki.debian.org/UltimateDebian…). 13/
I could go on ... (yes, @debian is not the only effort of this kind (@fedora Scientific, Arch Linux...), but twitter does not want this thread to grow any further).

Instead, let's check #FAIR4RS boxes for Comet, solely based on the fact that it is integrated into @debian 14/
F1.1 ✅@debian (RRID:SCR_006638) assigns a globally unique package name, and provides a (social) process for conflict resolution across all software (research or not), and splitting into appropriate sub-packages as needed. 15/
F1.2 ✅@debian enforces a strict versioning scheme that is "always forward" and can distinguish source from binary version and target platforms. 16/
F2 ✅ As evident from the thread so far, there is a wealth of (standardized) metadata on software provided. It covers not only the research aspect, but uniformly many facets of the role of running software on a computer for any purpose. 17/
F3 ✅All this information is linked to and available via the globally unique package name, and is versioned as necessary.

F4 ✅@debian is one giant searchable index. All the information above is provided via queryable endpoints, but also the software itself. See next. 18/
A1 ✅There is no difference between installing an office suite or a specialized mass spectrometry research tools.

apt install <name>

Protocol and clients software are free and open source, anyone can roll their own repository infrastructure for free too. 19/
A2 ✅Metadata records are not contained in the software packages, but even those and all their version are unconditionally and indefinitely archived too. 20/
R1.1 ✅Every software gets a full license audit and a machine-readable license specification, down to the level of individual components, as part of its integration process. This is typically performed by someone other than the authors, and always independently verified. 21/
R1.2 ✅

Any modifications are clearly associated with an agent identity and a timestamp. Version control systems are linked and accessible via machine-readable metadata. Reproducibility of build instructions and outcomes are automatically and repeatedly tested. 22/
R2 ✅ Any and all software dependency declarations are machine-readable and verifiably complete. 23/
@debian you'll find expertise and support to *leap* towards #FAIR4RS!

Obviously and unsurprisingly it is not enough on its own. I1, I1, and R3 remain fair and square the responsibility of authors and maintainers of a particular piece of research software.

24/
@debian system integration is a proven way to meet many of the #FAIR4RS principle. You'll find a community of experts that has been doing this routinely for ~2 decades before this paper came out.

Sad when academia ignores the world surrounding it. #ivorytower

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Michael Hanke

Michael Hanke Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(