Neil Currie Profile picture
Data scientist | Freelancer
2 subscribers
Feb 9, 2023 17 tweets 0 min read
Jan 6, 2023 17 tweets 6 min read
Struggling with functions in R?

You're not alone!

In this thread, I'll show you some tips and tricks for upping your function-writing game.

Let's dive in...

#rstats #coding #datascience Functions are blocks of code organised together which perform a specific task.

R comes with many built-in functions which we can add to by downloading fantastic packages.

However, writing your own functions is where the true power of R lies - though it can be confusing.

1/16
Dec 16, 2022 16 tweets 9 min read
How to create dumbbell plots with ggplot2

Impress clients, make the complex simple and up your data vis game.

#rstats #datavis #datascience A dumbbell chart showing fl... Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for TidyTuesday earlier this year:

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.
Nov 17, 2022 12 tweets 6 min read
How to map with purrr in R...

...so you can ditch the for loops, tidy your code and handle errors with ease.

#rstats #datascience #coding The map family of functions are a powerful weapon in your coding toolkit.

Maybe you've used the apply family of functions before?

These are similar but the syntax is much more straightforward and consistent across the functions.

Let's dive in.

1/11
Oct 28, 2022 12 tweets 5 min read
Ever struggled with error handling in R?

tryCatch can be a little tricky to learn.

But there's a simpler, lesser known alternative.

#rstats #coding #datascience Errors in R can be annoying.

There's nothing worse than running a long piece of code then, part way through, it crashes.

You need to rewrite your code and start again.

That is where error handling comes in.

1/11
Oct 20, 2022 21 tweets 7 min read
How to use R with GitHub

...so you can share your projects, version control code and stand out from the crowd with a data portfolio

#rstats #coding #datascience GitHub is an online platform used for hosting, tracking and collaborating on software projects, built on a free version control system, Git.

GitHub knowledge is a key skill for any data professional.

And a portfolio can make you seriously stand out from the crowd.

1/20
Oct 7, 2022 17 tweets 6 min read
Writing functions in R troubling you?

How and when to write your own functions (including the mysteries of non-standard evaluation and the ellipsis ...)

#rstats #coding #datascience Functions are blocks of code organised together which perform a specific task.

R comes with many built-in functions which we can add to by downloading fantastic packages.

However, writing your own functions is where the true power of R lies - though it can be confusing.

1/16
Sep 27, 2022 19 tweets 6 min read
A pillar of good coding is good documentation. And it goes beyond just comments.

How to write great documentation for your code in R.

#rstats #coding #datascience Documenting your code is essential so that others, or future you, can understand what is going on.

Coding is complicated. What is obvious to you may not be obvious to someone else. Good documentation can help them out.

The first thing we will look at are comments.

1/18
Sep 20, 2022 21 tweets 7 min read
You've probably installed a bunch of packages in R.

But did you know they are a fantastic way to organise your own analysis?

A thread on how to write your own package in R.

#rstats #coding #datascience Packages are collections of code and data, combined together into one tidy bundle.

This makes them great for analysis projects - not just for downloading packages like dplyr or data.table.

Writing a package might sound intimidating but I'll show you how.

1/20
Sep 13, 2022 21 tweets 8 min read
If you want to write robust code, unit testing is a must. A thread on how to write unit tests in R.

#rstats #coding #datascience A fundamental principle of writing strong code is the DRY principle - don't repeat yourself.

In R, functions allow us to write chunks of code that can be used again and again.

But how do you know your function is working correctly?

This is where unit testing comes in.

1/20
Sep 7, 2022 10 tweets 5 min read
How to use R with DuckDB (so you can turbo charge your data queries and handle big data effortlessly)

#rstats #bigdata #duckdb

1. DuckDB is a fantastic tool currently seeing a rapid rise in the data world.

It is designed for rapid analytical queries and works brilliantly with big data.

Best of all you can use it directly in tools like #rstats, #python, #java and others.

Let's see how it works in R.

2.
Aug 3, 2022 19 tweets 5 min read
Ever wondered how to wrangle big data with R?

A thread on using Apache Arrow in R👇🧵

#rstats #bigdata #datascience Big data is any data which is too large to be loaded into memory all in one go.

Ever tried to read a large dataset into R and everything fell over or slowed waaay down?

This is where Arrow comes in.

1/18
Jul 21, 2022 16 tweets 9 min read
How to create dumbbell plots with ggplot2

Impress clients, communicate complex insights simply and up your data vis game.

👇🧵

#rstats #datavis Image Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for last weeks TidyTuesday

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.

1/15
Jul 15, 2022 13 tweets 6 min read
Fed up of using spreadsheets but can't escape them?

How to build spreadsheets directly from R (so you can save time and reduce errors) 👇🧵

#rstats #datascience #excel In this thread I will cover how to:

- create a workbook with openxlsx
- add data
- add formulas
- format cells
- format the sheet
- conditional formatting.

All in an R script.

Let's go.

1/13
Jul 13, 2022 8 tweets 4 min read
If you want to speed up your code, learning R's timing functions are essential.

A short thread on timing your code in R 👇🧵

#rstats #datascience 1. Sys.time

Sys.time is a base R function which returns the current time.

You can save the current time to a variable and, with some simple maths, figure out how much time has passed. library(tidyverse) library(tictoc) library(microbenchmark)
Jul 5, 2022 13 tweets 4 min read
Ever heard of parallel processing but not known where to start? Want your code to run faster with a simple trick?

A short thread introducing the furrr package in R👇🧵

#rstats #datascience The furrr package is based on the mapping functions in the purrr package.

In lots of cases these functions can replace the use of for loops in R, simplifying your code.

R for Data Science has a great chapter on the purrr package:

r4ds.had.co.nz/iteration.html
Jun 30, 2022 18 tweets 7 min read
Ever wondered how to join big data in R?

A thread on using Spark in R👇🧵

#rstats #spark #datascience This is thread # 3 in a series exploring using Spark in R with the sparklyr package. You can find the others here:

# 1:


# 2:

1/17
Jun 15, 2022 19 tweets 5 min read
Ever wondered how to manipulate big data with R?

A thread on using Spark in R👇🧵

#rstats #spark #datascience Big data is any data which is too large to be loaded into memory all in one go. Ever tried to read a large dataset into R and everything fell over or slowed waaaaaay down? This is where Spark comes in.

1/18