The evolution of the #RStats script (a thread).

1. You make some analysis with a R script.
You want to share it with some collaborator so she can explore and review the code, propose modifications, fixes, improvements, etc. You send the script by e-mail, along with the data
Problem: The script is not portable. She needs to substitute some platform-specific packages and functions and modify all paths to the data and to various files. When she sends back her edits, you need to manually revert those changes back again so that it works at your place.
2. The project directory.

You organise files in a directory with a standard structure (src, data, reports) and only use relative paths. You adopt UTF-8 encoding and cross-platform packages and functions.
Your collaborator sends a document reporting and discussing the results.
Problem: The figures and tables that you get don't match those in the report. Besides, each time you change something, you need to manually update all the results in the report, or at least verify that they didn't change. It is easy to overlook or forget something.
3. Rmarkdown

You integrate the script and the report into a single source document by substituting all the results in the report by the corresponding R-code that generates them. You have now several reports (descriptive, alternative models, and one for model comparison).
Problem: The document becomes larger and things difficult to find. Compiling is slow due to some time-consuming steps. Each day, you need to execute all chunks from the beginning. The various reports (specially the last one) use objects from the others that need to be recomputed.
4. The targets package
docs.ropensci.org/targets/
You use the #targets package to separate the computations from the reports. The package keeps track of the dependencies among objects, and you can retrieve any result from inside the Rmarkdown documents.
Problem: Integrating changes from various collaborators and keeping track of modifications.
5. The versioned project in a git repository.

You initialise a local git repository and push your changes into a remote repository accessible for your collaborators.
Problem: Discuss results with other collaborators in the project that don't need nor want to set up all the infrastructure. Sharing by e-mail megabytes in attachments each time the reports are updated. Also, sometimes there are confusions about the latest version of the reports.
6. Publishing reports online using Continuous Integration and Git(La|Hu)b pages.

Now you can work simultaneously on the same or different documents, integrate everything automagically and make sure the reports are up to date and online with the push of a button.
Problem: The project is finalised and you want to share it more widely. Even with full access to the code, it is not trivial for most people to set up a suitable environment. Installing R and the necessary packages at the same or compatible versions may not even be desirable.
7. Docker

You distribute a #docker image containing a operating system, the appropriate version of R and R packages and your full repository, including cached targets objects.

The nirvana of #reproducibility.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Facundo Muñoz

Facundo Muñoz Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @famuvie

Nov 1, 2020
Inspirado por grandes personas divulgadoras de las #matematicas y la #Estadistica como @AnaBayes, @juliomulero, @Picanumeros, @ClaraGrima (y otras!) y sin pretender llegarles ni a los tobillos, quisiera ensayar un pequeño hilo
¿Qué es (esencialmente) la Estadística?
Si vamos a lo fundamental, yo la definiría como:

«La disciplina que estudia cómo extraer conclusiones a partir de observaciones con incertidumbre»

Eso hay que bajarlo a un ejemplo concreto:
Read 18 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(