Neil Currie Profile picture
Sep 27 19 tweets 6 min read
A pillar of good coding is good documentation. And it goes beyond just comments.

How to write great documentation for your code in R.

#rstats #coding #datascience
Documenting your code is essential so that others, or future you, can understand what is going on.

Coding is complicated. What is obvious to you may not be obvious to someone else. Good documentation can help them out.

The first thing we will look at are comments.

1/18
In R, one of the first things you probably learned was writing a comment with a hashtag #

When writing comments focus on the why - for instance, giving some business context is more useful than explaining what is going on in a simple calculation.

2/18 # Bad ---  # Multiply dollars by exchange rate to get pounds
If something is complicated though, don't be afraid to explain the how. Over-documenting isn't ideal but I'd argue it's better to overdo it than under do it.

Documentation is not just comments though.

Well named objects and functions will improve understanding.

3/18
In the attached example, I find the one without comments is easier to interpret.

4/18 # Bad--- # x are the actual values # y are the predicted val
Documentation goes beyond just well written code and comments though.

How we choose to organise our work can improve documentation.

This is where packages and roxygen2 come in.

5/18
Last week I wrote a thread about how to write your own analysis package.

If you aren't familiar with package building, give it a read first.

A benefit of organising your projects in packages is they make creating great documentation easy.

Especially if you use roxygen2.

6/18
The roxygen2 package lets you write function documentation, with great syntax, right next to your function code.

It then processes this, automatically creating everything needed to generate professional help files.

7/18
roxygen2 comments start with #'.

We can use keywords with the @ symbol to create the rest of the documentation.

Twitter isn't great for describing this since using @ will mention accounts so read on for attached examples which add some clarity.

8/18
To add a title or description use the title and description keywords respectively.

In fact, for these parts you don't even need the keywords, the first sentence will default to the title and the second paragraph will default to the description.

9/18
To add arguments use the param keyword followed by the name of the argument and your description.

Just separate everything with a space.

You can combine arguments into one, separating them with a comma.

10/18
Information on outputs can be added with the returns and details keywords.

Under returns, typically you detail the shape of the output.

Under details this may be a longer paragraph discussing more details about the function and the output.

11/18
Always add a worked example using the examples keyword.

Reading worked examples is a great way to improve understanding. So these are important.

Finally use the export keyword to make sure the documentation gets processed.

12/18
Putting it altogether you will have something like the attached.

13/18 #' @title My Dummy Function #' @description A fake dummy fun
To create the documentation run:

devtools::document()

To load (or reload) your package run:

devtools::load_all()

Now you can use ?myfunction to read the help files for that function - the same as pro packages like ggplot2 or lubridate.

14/18 An example of what the roxygen2 code in the previous example
With packages and projects you can also add a README file.

README files are written using markdown syntax for styling your text and creating things like tables or bulleted lists.

Just create a README .md file in your project directory.

15/18
As an example, my threads Github repo has a README file which you can find here if you scroll down:

github.com/neilcuz/threads

This is generated by this underlying code here:

raw.githubusercontent.com/neilcuz/thread…

Written all inside R.

16/18
To recap:

Good documentation is essential for great code.

In comments, focus on the why, using great naming to improve readability.

Organise code in packages and document with roxygen2 to create pro help files easily.

Do your readers a favour and add a README file.

17/18
Thanks for reading, if you liked this thread follow me @neilgcurrie for mainly R and data tweets.

Code for this one is here:

github.com/neilcuz/thread…

Code and links to my other threads can be found here:

github.com/neilcuz/thread…

18/18

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Neil Currie

Neil Currie Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @neilgcurrie

Sep 20
You've probably installed a bunch of packages in R.

But did you know they are a fantastic way to organise your own analysis?

A thread on how to write your own package in R.

#rstats #coding #datascience
Packages are collections of code and data, combined together into one tidy bundle.

This makes them great for analysis projects - not just for downloading packages like dplyr or data.table.

Writing a package might sound intimidating but I'll show you how.

1/20
Let's say you have code, data and outputs for some analysis in a single folder.

If you aren't working inside an R project already click File > New Project > Existing Directory > Browse (select the folder) > Create Project.

2/20
Read 21 tweets
Sep 13
If you want to write robust code, unit testing is a must. A thread on how to write unit tests in R.

#rstats #coding #datascience
A fundamental principle of writing strong code is the DRY principle - don't repeat yourself.

In R, functions allow us to write chunks of code that can be used again and again.

But how do you know your function is working correctly?

This is where unit testing comes in.

1/20
A unit test is a bit of code that tests your function is working correctly.

Say we have a function that adds 2 numbers together.

We know 2 + 2 = 4 so we would expect our function to give the same answer.

We can write tests easily in R with the testthat package.

2/20
Read 21 tweets
Sep 7
How to use R with DuckDB (so you can turbo charge your data queries and handle big data effortlessly)

#rstats #bigdata #duckdb

1.
DuckDB is a fantastic tool currently seeing a rapid rise in the data world.

It is designed for rapid analytical queries and works brilliantly with big data.

Best of all you can use it directly in tools like #rstats, #python, #java and others.

Let's see how it works in R.

2.
You can install DuckDB straight from the console with install.packages("duckdb").

Unlike some other big data tools it is entirely self-contained.

This means no extra ongoing maintenance - your IT department will thank you for that.

3.
Read 10 tweets
Aug 3
Ever wondered how to wrangle big data with R?

A thread on using Apache Arrow in R👇🧵

#rstats #bigdata #datascience
Big data is any data which is too large to be loaded into memory all in one go.

Ever tried to read a large dataset into R and everything fell over or slowed waaay down?

This is where Arrow comes in.

1/18
Arrow is a software platform designed for working with large datasets - you can use it in R, Python, Julia and more.

It uses an in-memory columnar format for structuring data.

Long story short, this means it's fast.

arrow.apache.org/overview/

2/18
Read 19 tweets
Jul 21
How to create dumbbell plots with ggplot2

Impress clients, communicate complex insights simply and up your data vis game.

👇🧵

#rstats #datavis Image
Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for last weeks TidyTuesday

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.

1/15
Dumbbell plots can be created using 2 ggplot2 building blocks: geom_point and geom_line.

So let's start building from the ground up with some examples in each before combining to make a dumbbell plot.

2/15
Read 16 tweets
Jul 15
Fed up of using spreadsheets but can't escape them?

How to build spreadsheets directly from R (so you can save time and reduce errors) 👇🧵

#rstats #datascience #excel
In this thread I will cover how to:

- create a workbook with openxlsx
- add data
- add formulas
- format cells
- format the sheet
- conditional formatting.

All in an R script.

Let's go.

1/13
I don't know about you but over the years my dislike of spreadsheets has grown. I have seen some horrors in my time.

They are messy, slow and not reproducible. Being manual they are also error prone.

And it just so happens the entire world is built on them.

2/13
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(