Neil Currie Profile picture
Sep 20 21 tweets 7 min read
You've probably installed a bunch of packages in R.

But did you know they are a fantastic way to organise your own analysis?

A thread on how to write your own package in R.

#rstats #coding #datascience
Packages are collections of code and data, combined together into one tidy bundle.

This makes them great for analysis projects - not just for downloading packages like dplyr or data.table.

Writing a package might sound intimidating but I'll show you how.

1/20
Let's say you have code, data and outputs for some analysis in a single folder.

If you aren't working inside an R project already click File > New Project > Existing Directory > Browse (select the folder) > Create Project.

2/20
You are now working inside an R project.

To double check look and see if an .Rproj file has appeared in your directory. In the top left hand corner you should also see the name of your project.

3/20
Now we will turn this project into a package.

From inside the project we need to run usethis::create_package(getwd(), check_name = FALSE).

There are some rules around the format of package names for uploading to CRAN. We won't upload to CRAN so check_name = FALSE here.

4/20 A screen shot of the code t...
A prompt may appear asking you overwrite the .Rproj file - answer yes by typing the correct number.

Your project is now a package.

But what does that mean practically?

5/20
You should see the following has appeared in your directory:

- a DESCRIPTION file
- a NAMESPACE file
- an 'R' folder

6/20
The key thing that identifies this as a package to R is the DESCRIPTION file.

You will want to edit the package and other information in here.

Additionally, you will want to add any other packages which your analysis uses in here under Depends.

7/20 Image
When we come to load our package later this means you don't need to load the other packages you were using manually. They will be loaded automatically.

The NAMESPACE file we don't need to worry about for now. R will edit this automatically.

8/20
Before creating a package you might have stored functions in one file (and called source to read them to memory) or maybe they were in your analysis file?

In a package functions live in the 'R' folder.

You can organise them anyway you want really.

9/20
I like having a file for each function unless it makes sense not too.

For instance, I include helper functions in with its 'parent' function and, where it makes sense, I will group 'families' of functions together.

10/20 function1 <- function (x) {...# Mean absolute percentage ...
Packages make documenting your functions super easy - this is an important and often overlooked part of software development.

You can do this with the roxygen2 package. Explaining in depth would take a while so I will leave for a future thread.

11/20 #' Mean absolute percentage...
In the mean time have a look at:

cran.r-project.org/web/packages/r…

When you have written your documentation you can run devtools::document() to update it.

Now when you run:

?functionname

your help files appear in the Help pane, just like the pro packages you already know.

12/20 devtools::document() ?mapeA screenshot of the rendere...
Packaging your code also makes unit testing easy.

Inside your package you can run all your tests at once using devtools::test() in the console.

I wrote a thread about unit testing last week if you aren't familiar with it or need a refresh.

13/20
Now it's time to organise our files into something more sensible.

I have created the following folders (though feel free to change this setup):

- analysis, analysis scripts live here
- data, raw data goes here
- output, outputs from our code go here.

14/20 Image
Some good practice tips:

- treat the raw data inside the data folder as read only i.e. don't modify it

- anything produced in the output folder you should be able to recreate by running your scripts

15/20
When working on your analysis script use devtools::load_all() or the keyboard shortcuts (Ctrl/Cmd + Shift + L) to load all your functions into memory.

This will also load any packages you specified in the DESCRIPTION file under Depends.

16/20
Putting it altogether your workflow will look something like this:

1. Edit functions
2. Write tests
3. devtools::test
4. Write documentation
5. devtools::document
6. Ctrl/Cmd + Shift + L or devtools::load_all
7. Edit analysis scripts.

And repeat. The order will vary.

17/20
I appreciate that was a very quick introduction of how to build a package.

Some of the tips here might change if you were writing a package for CRAN.

I hope that is enough to get you up and running, and shows you how achievable creating a package is.

18/20
To recap:

- Packages are a great way to organise analysis projects
- You can turn existing analysis into packages
- Testing and documenting functions is easy with a package
- Packages improve your workflow
- Writing a package sounds intimating but it's not too tricky

19/20
Thanks for reading, if you liked this thread follow me @neilgcurrie for mainly R and data tweets.

No code for this one, but code and links to my other threads can be found here:

github.com/neilcuz/thread…

20/20

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Neil Currie

Neil Currie Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @neilgcurrie

Sep 13
If you want to write robust code, unit testing is a must. A thread on how to write unit tests in R.

#rstats #coding #datascience
A fundamental principle of writing strong code is the DRY principle - don't repeat yourself.

In R, functions allow us to write chunks of code that can be used again and again.

But how do you know your function is working correctly?

This is where unit testing comes in.

1/20
A unit test is a bit of code that tests your function is working correctly.

Say we have a function that adds 2 numbers together.

We know 2 + 2 = 4 so we would expect our function to give the same answer.

We can write tests easily in R with the testthat package.

2/20
Read 21 tweets
Sep 7
How to use R with DuckDB (so you can turbo charge your data queries and handle big data effortlessly)

#rstats #bigdata #duckdb

1.
DuckDB is a fantastic tool currently seeing a rapid rise in the data world.

It is designed for rapid analytical queries and works brilliantly with big data.

Best of all you can use it directly in tools like #rstats, #python, #java and others.

Let's see how it works in R.

2.
You can install DuckDB straight from the console with install.packages("duckdb").

Unlike some other big data tools it is entirely self-contained.

This means no extra ongoing maintenance - your IT department will thank you for that.

3.
Read 10 tweets
Aug 3
Ever wondered how to wrangle big data with R?

A thread on using Apache Arrow in R👇🧵

#rstats #bigdata #datascience
Big data is any data which is too large to be loaded into memory all in one go.

Ever tried to read a large dataset into R and everything fell over or slowed waaay down?

This is where Arrow comes in.

1/18
Arrow is a software platform designed for working with large datasets - you can use it in R, Python, Julia and more.

It uses an in-memory columnar format for structuring data.

Long story short, this means it's fast.

arrow.apache.org/overview/

2/18
Read 19 tweets
Jul 21
How to create dumbbell plots with ggplot2

Impress clients, communicate complex insights simply and up your data vis game.

👇🧵

#rstats #datavis Image
Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for last weeks TidyTuesday

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.

1/15
Dumbbell plots can be created using 2 ggplot2 building blocks: geom_point and geom_line.

So let's start building from the ground up with some examples in each before combining to make a dumbbell plot.

2/15
Read 16 tweets
Jul 15
Fed up of using spreadsheets but can't escape them?

How to build spreadsheets directly from R (so you can save time and reduce errors) 👇🧵

#rstats #datascience #excel
In this thread I will cover how to:

- create a workbook with openxlsx
- add data
- add formulas
- format cells
- format the sheet
- conditional formatting.

All in an R script.

Let's go.

1/13
I don't know about you but over the years my dislike of spreadsheets has grown. I have seen some horrors in my time.

They are messy, slow and not reproducible. Being manual they are also error prone.

And it just so happens the entire world is built on them.

2/13
Read 13 tweets
Jul 13
If you want to speed up your code, learning R's timing functions are essential.

A short thread on timing your code in R 👇🧵

#rstats #datascience
1. Sys.time

Sys.time is a base R function which returns the current time.

You can save the current time to a variable and, with some simple maths, figure out how much time has passed. library(tidyverse) library(tictoc) library(microbenchmark)
2. tictoc

The tictoc package works in the same way as above but is more elegant.

You can setup different timers with a string name to keep track of them. # similar to the Sys.time approach  tic()  diamonds |>    mu
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(