Follow @WomenInStat

12,399 views

Women in Statistics and Data Science

Follow @WomenInStat

, 24 tweets, 5 min read

My Authors

Ready or not, here comes a thread on making R packages. I want to tweet about this on the Women in Stat account because women are underrepresented as R package maintainers. (1/)

If you find yourself using the same (or similar) code a few times, take a little time now to save time later. This isn't an all-or-nothing thing: if you have the same code written a few times, the first step is to make a function. Then you can call that function. (2/)

Sometimes (like in the situation I just described) you realize after-the-fact that you should turn code into a function. Other times, you will have the foresight to recognize that a function would be wise. As you program more, you'll recognize this more easily. (3/)

In fact, if no one has ever told you this, I will now: coding is not a strictly keyboard-and-computer affair. I recommend you to grab some scratch paper and a pen so you can make a plan before you start coding. This planning will help you know what to make into a function. (4/)

This planning will also help you recognize what the arguments (inputs) and the outputs of your function should be. Write a little pseudo code to decide what should be hard-coded into the function (if anything) and what should be an argument of the function. (5/)

If you want to output a couple objects, you can always shove them into a list and return the list. I like to name the items in my list so I can index the output more easily later. So rather than just list(x, y) I'll do list(x=x, y=y) so that the 1st item in the list is x and (6/)

the 2nd item in the list is named y.

If you have a pretty straight forward, simple bit of code to write, then the planning stage will take maybe 30 seconds and scratch paper. If you have a bigger project (e.g. an R package), then take plenty of time to plan. (7/)

You'll want to go beyond one scratch piece of paper. I recommend you create a "design document" that lays out all the pre-coding considerations and plans. It's like a blueprint. Here's an example of one I made for my first R pkg: github.com/knudson1/glmm/… (8/)

You'll see the design doc has some big picture things to start. Then, it lays out tons of details for each function: inputs, outputs, pseudo code as appropriate, numerical stability as appropriate, which functions it calls, what functions call it, dims of variables. (9/)

I forgot to list the most important thing: design docs list important equations and the goal of each function!

The goal is to get as much planning as possible out of the way. Let your organizational mind run wild. You'll thank yourself later. (10/)

Seriously, I held onto my design doc like it was a map and I was alone in the Boundary Waters. I did not take steps without that design doc in my figurative hand. If I had had enough $ to have two monitors, then it would have been on my 2nd monitor nonstop. (11/)

This is also useful if you are going to work in a team. Different people can program different functions because you have planned every detail to ensure that the puzzle pieces will fit together later. (12/)

Additionally, when you want to come back a few years later and make a change/addition, you'll be able to use your design doc to figure out which functions need tweaking. (13/)

As you can probably tell, the design doc is an essential piece of a well-organized hunk of code (whether it's a script, a function, or an R package).

If you get one thing from this thread, it better be "MAKE A DESIGN DOC." (14/)

Another important part of your design doc: tests. You don't want to just test the final product, otherwise you'll have no idea how to tackle debugging. Figure out how to test each function. Think about how the functions will be used and test each one's different uses. (15/)

As you code, I recommend you start with the base level functions (functions called by other functions) and test them (make sure they work exactly as expected) before you move on to the next function. (16/)

To help with tests, check out the testthat R package. I didn't use this when I wrote glmm bc I don't know if it existed in 2014, but it looks useful! (17/)

The kinds of tests you write will depend on your code. One thing I do a lot is write the function for some general calculation. Then I do the calculation for a concrete example (without using the function) and run the function on the same problem. The output should match. (18/)

all.equal( ) is my good friend. You don't want to use your human eyeballs to compare the output, because you'll make a silly mistake (like thinking -0.123456789 is the same as 0.123456789) because we are silly humans. Make the computer do computer work. You do human work. (19/)

Also, it's nice to have test output that is only TRUE/FALSE. Then you can see at a glance that all is well (TRUE TRUE TRUE nice!) or there's an issue (TRUE TRUE FALSE nooooo!). (20/)

If you make a package, then you'll have a whole folder of test code (.R files), which will run every time you do an R CMD CHECK. For each .R file, I highly recommend you have a .Rout.save file (which has the test code and OUTPUT). The checker will compare ... (21/)

...the test results (ie the .Rout file it creates during the check) against the .Rout.save file to ensure everything agrees. If the two files disagree, then R CMD CHECK prints the differences. (e.g. then you'd know it said TRUE before and now it says FALSE.) (22/)

You can see an example of .R test files and the corresponding .Rout.save files here:
github.com/knudson1/glmm/…
(23/)

My stomach is the queen of my life, so let's wrap this up.

PLAN. Figure it all out ahead of time so you don't have to think too hard while you're doing the coding.

TEST. Test everything in every way. Make .R test files and their .Rout.save files.

TWEET. It's fun.

(24/24)

Try unrolling a thread yourself!

More from @WomenInStat see all

Embed code for your website

Did Thread Reader help you today?