Neil Currie Profile picture
Oct 7 17 tweets 6 min read
Writing functions in R troubling you?

How and when to write your own functions (including the mysteries of non-standard evaluation and the ellipsis ...)

#rstats #coding #datascience
Functions are blocks of code organised together which perform a specific task.

R comes with many built-in functions which we can add to by downloading fantastic packages.

However, writing your own functions is where the true power of R lies - though it can be confusing.

1/16
In this thread I will cover:

1. When to write functions
2. How to write functions
3. Tidy evaluation
4. Passing multiple arguments with ellipsis ...
5. Error handling arguments
6. Side effects

2/16
A question I hear from newer coders is when should I actually write a function?

I follow some basic rules of thumb to decide went to turn code into a function, though it isn't an exact science.

1. The rule of 3.
2. The rule of organisation
3. The rule of testing.

3/16
1. The rule of 3

If I repeat some code 3 or more times, or I expect to, then I write a function.

Many errors enter code from copy + paste or retyping.

By calling a function many times you reduce the chances of this error, and to update code you need only change one part.

4/16 # Bad  sample1 <- sample(c(TRUE, FALSE), size = 10, replace
2. The rule of organisation

Often code can become long and complex.

In these cases I will split code into functions where I think this will aid readability and manageability.

Often the inside of a big for loop is a good candidate.

5/16
3. The rule of testing

Sometimes code contains calculations or small chunks that are crucial to the program or are logically complex and I want to test them individually.

In these cases I write functions.

To learn more about unit testing have a look here:

6/16
Arguments are the variables which are passed into the function.

They can be matched positionally or by name.

You can create default values for your function using = in the function definition.

7/16 # If you cast your mind back to school this is Pythagoras' T
If you have used the tidyverse you will have come across tidy evaluation.

Tidy eval allows us to use code like select(data, x) instead of select(data, "x").

But how can we take advantage of that behaviour with our own functions?

8/16
To use tidy eval in our own code we use double curly brackets {{arg}} to wrap the argument.

If the argument relates to the left hand side (i.e. a name) of a tidyverse function like mutate we also need to use :=

9/16 dummy <- tibble(id = 1:20, i = runif(20), j = runif(20), k =# Will work with the curly brackets  transform_data2 <- func
Arguments are often a great source of errors.

Checking for these errors and telling the user what they should do is great practice.

Some good checks of arguments are type, structure and validity of values supplied.

Be proportionate and try to anticipate likely errors.

10/16 calculate_c <- function (side_a, side_b, squared = FALSE) {
The ellipsis allows functions to take any number of arguments.

Some good examples are the paste and print functions.

In our own functions they are especially handy when we write wrapper style functions.

11/16
When using ... in functions it is good practice to place it after your main arguments but before any additional arguments with default values.

This makes supplying the ellipsis arguments easier when using the function.

12/16 # Maybe you want to use some of the aesthetics for tweaking
We return variables from functions using the return function or, if missing, R will return the last evaluated object.

But sometimes when we write functions we aren't interested in the return value but its side effects e.g. when creating a plot or outputting data.

13/16
In these cases it is good practice to return the first argument unmodified.

Wrap it in the invisible function so it doesn't clutter your console by printing.

You could easily put this in a pipeline now.

14/16 sample_and_write <- function (sizes) {      output_path <- f
To recap:

There are several rules of thumb to decide when a function is needed.

Some tricks like tidy eval and ... make your functions more powerful.

ALWAYS write error handling.

The return value isn't always important. Sometimes the side effect is the main event.

15/16
Thanks for reading, if you liked this thread follow me @neilgcurrie for mainly R and data tweets.

Code: github.com/neilcuz/thread…

Visit:
torchdata.io

16/16

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Neil Currie

Neil Currie Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @neilgcurrie

Sep 27
A pillar of good coding is good documentation. And it goes beyond just comments.

How to write great documentation for your code in R.

#rstats #coding #datascience
Documenting your code is essential so that others, or future you, can understand what is going on.

Coding is complicated. What is obvious to you may not be obvious to someone else. Good documentation can help them out.

The first thing we will look at are comments.

1/18
In R, one of the first things you probably learned was writing a comment with a hashtag #

When writing comments focus on the why - for instance, giving some business context is more useful than explaining what is going on in a simple calculation.

2/18 # Bad ---  # Multiply dollars by exchange rate to get pounds
Read 19 tweets
Sep 20
You've probably installed a bunch of packages in R.

But did you know they are a fantastic way to organise your own analysis?

A thread on how to write your own package in R.

#rstats #coding #datascience
Packages are collections of code and data, combined together into one tidy bundle.

This makes them great for analysis projects - not just for downloading packages like dplyr or data.table.

Writing a package might sound intimidating but I'll show you how.

1/20
Let's say you have code, data and outputs for some analysis in a single folder.

If you aren't working inside an R project already click File > New Project > Existing Directory > Browse (select the folder) > Create Project.

2/20
Read 21 tweets
Sep 13
If you want to write robust code, unit testing is a must. A thread on how to write unit tests in R.

#rstats #coding #datascience
A fundamental principle of writing strong code is the DRY principle - don't repeat yourself.

In R, functions allow us to write chunks of code that can be used again and again.

But how do you know your function is working correctly?

This is where unit testing comes in.

1/20
A unit test is a bit of code that tests your function is working correctly.

Say we have a function that adds 2 numbers together.

We know 2 + 2 = 4 so we would expect our function to give the same answer.

We can write tests easily in R with the testthat package.

2/20
Read 21 tweets
Sep 7
How to use R with DuckDB (so you can turbo charge your data queries and handle big data effortlessly)

#rstats #bigdata #duckdb

1.
DuckDB is a fantastic tool currently seeing a rapid rise in the data world.

It is designed for rapid analytical queries and works brilliantly with big data.

Best of all you can use it directly in tools like #rstats, #python, #java and others.

Let's see how it works in R.

2.
You can install DuckDB straight from the console with install.packages("duckdb").

Unlike some other big data tools it is entirely self-contained.

This means no extra ongoing maintenance - your IT department will thank you for that.

3.
Read 10 tweets
Aug 3
Ever wondered how to wrangle big data with R?

A thread on using Apache Arrow in R👇🧵

#rstats #bigdata #datascience
Big data is any data which is too large to be loaded into memory all in one go.

Ever tried to read a large dataset into R and everything fell over or slowed waaay down?

This is where Arrow comes in.

1/18
Arrow is a software platform designed for working with large datasets - you can use it in R, Python, Julia and more.

It uses an in-memory columnar format for structuring data.

Long story short, this means it's fast.

arrow.apache.org/overview/

2/18
Read 19 tweets
Jul 21
How to create dumbbell plots with ggplot2

Impress clients, communicate complex insights simply and up your data vis game.

👇🧵

#rstats #datavis Image
Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for last weeks TidyTuesday

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.

1/15
Dumbbell plots can be created using 2 ggplot2 building blocks: geom_point and geom_line.

So let's start building from the ground up with some examples in each before combining to make a dumbbell plot.

2/15
Read 16 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(