Neil Currie Profile picture
Dec 16 16 tweets 9 min read
How to create dumbbell plots with ggplot2

Impress clients, make the complex simple and up your data vis game.

#rstats #datavis #datascience A dumbbell chart showing fl...
Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for TidyTuesday earlier this year:

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.
Dumbbell plots can be created using 2 ggplot2 building blocks: geom_point and geom_line.

So let's start building from the ground up with some examples in each before combining to make a dumbbell plot.
1. geom_point

Plots in ggplot2 start with the ggplot function. We then add + successive layers with geometry functions before customising the appearance further.

geom_point is used to create scatter plots and other point charts. library(ggplot2) library(dp...A scatterplot of value1 aga...A point plot of time_period...A point plot of group1 agai...
Notice I placed the x and y variables inside the aes function.

I found aesthetics hard to understand at first - you read lots of talk of mapping variables and visual cues which didn't make sense to me then.

But I think there is an easier way to understand them.
Anything on a chart (a line, point, colour, shape, size etc) which changes with the data, place inside aes().

Anything which doesn't vary with data (a colour, shape, size, etc), place outside aes().

It's easier to see with an example.
In plot 1 colour is placed inside aes so the colour varies with the data. Here it corresponds to the group1 value since colour = group1.

In the second plot colour is placed outside aes. The colour is red regardless of the data values. # Colour inside aes - it va...A scatterplot of value1 aga...A scatterplot of value1 aga...
2. geom_line

geom_line works similarly to geom_point.

You can add 1 or more lines and easily combine with geom_point. # Some more dummy data cels...A line chart of high temper...A line chart of high and lo...A line chart of high and lo...
The dummy celsius dataset was in what is known as wide format.

We can put this into long format using pivot_longer from tidyr. Long format will be important for our dumbbell plot. There is now a row for high and for low for each month. celsius_long <- pivot_longe...A snapshot of the celsius d...A snapshot of the celsius d...
From long format we can replicate the last plot with similar code.

Notice to get the correct colours I have created a named vector called chart colours and passed it to scale_colour_manual. This approach even throws in a legend.

Now we are ready for the dumbbell plot. chart_colours <- c("hi...The previous line chart rec...
3. Dumbbell plots

We know about geom_point, geom_line, aes and long format. We can combine these to make a dumbbell plot.

Let's start with the points.
We will group our data by high and low temperature with a point for the value. The y axis will have the months.

The first stab at it is not bad but the months go from December to January.

We can sort this by reordering the month name factor level. # First stab at the points ...The first go at doing the p...Image
Next we add the lines.

Here we use the group argument and place it inside aes. This will group the lines by month_name. The other aes arguments are the same.

The first attempt is ok but the lines overlap the points. Not to worry, we can swap them. # Add the lines  ggplot(dat...The dumbbell plot now has l...By reversing the order of t...
Finally we can make some tweaks to make the chart look better.

We could do plenty more but the objective here was to learn about dumbbell plot, not the ins and outs of good design. # Some tweaks to make it lo...The final dumbbell plot.
To recap:

1. Dumbbell plots are a great way to communicate insights.

2. These can be created with ggplot2 by combining geom_line and geom_point.

3. You need to know a bit about aesthetics and long format data.
Thanks for reading, if you liked this thread follow me @neilgcurrie for R and data tweets.

Website:

shoogle.co

YouTube

youtube.com/@shoogle

Code:

github.com/neilcuz/thread…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Neil Currie

Neil Currie Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @neilgcurrie

Nov 17
How to map with purrr in R...

...so you can ditch the for loops, tidy your code and handle errors with ease.

#rstats #datascience #coding
The map family of functions are a powerful weapon in your coding toolkit.

Maybe you've used the apply family of functions before?

These are similar but the syntax is much more straightforward and consistent across the functions.

Let's dive in.

1/11
First, when would you use a map function?

Whenever you would use a for loop.

Let's look at an example.

One way to refactor this code is to replace everything inside the loop with a function.

2/11 inputs <- list(runif(10), runif(10), runif(10), runif(10)) omean_top_5 <- function (x) {      x_ordered <- sort(x, decre
Read 12 tweets
Oct 28
Ever struggled with error handling in R?

tryCatch can be a little tricky to learn.

But there's a simpler, lesser known alternative.

#rstats #coding #datascience
Errors in R can be annoying.

There's nothing worse than running a long piece of code then, part way through, it crashes.

You need to rewrite your code and start again.

That is where error handling comes in.

1/11
You might have heard of the purrr package.

It's part of the tidyverse and contains the amazing family of map functions.

But it's also got 2 other functions we can use for error handling that are much easier than tryCatch.

2/11
Read 12 tweets
Oct 20
How to use R with GitHub

...so you can share your projects, version control code and stand out from the crowd with a data portfolio

#rstats #coding #datascience
GitHub is an online platform used for hosting, tracking and collaborating on software projects, built on a free version control system, Git.

GitHub knowledge is a key skill for any data professional.

And a portfolio can make you seriously stand out from the crowd.

1/20
Getting setup can be a little tricky, but I can show you how.

The first thing to do is install git to your machine.

But you may already have it.

Run git --version in the shell to find out.

If it prints the version, congratulations! You already have it.

2/20
Read 21 tweets
Oct 7
Writing functions in R troubling you?

How and when to write your own functions (including the mysteries of non-standard evaluation and the ellipsis ...)

#rstats #coding #datascience
Functions are blocks of code organised together which perform a specific task.

R comes with many built-in functions which we can add to by downloading fantastic packages.

However, writing your own functions is where the true power of R lies - though it can be confusing.

1/16
In this thread I will cover:

1. When to write functions
2. How to write functions
3. Tidy evaluation
4. Passing multiple arguments with ellipsis ...
5. Error handling arguments
6. Side effects

2/16
Read 17 tweets
Sep 27
A pillar of good coding is good documentation. And it goes beyond just comments.

How to write great documentation for your code in R.

#rstats #coding #datascience
Documenting your code is essential so that others, or future you, can understand what is going on.

Coding is complicated. What is obvious to you may not be obvious to someone else. Good documentation can help them out.

The first thing we will look at are comments.

1/18
In R, one of the first things you probably learned was writing a comment with a hashtag #

When writing comments focus on the why - for instance, giving some business context is more useful than explaining what is going on in a simple calculation.

2/18 # Bad ---  # Multiply dollars by exchange rate to get pounds
Read 19 tweets
Sep 20
You've probably installed a bunch of packages in R.

But did you know they are a fantastic way to organise your own analysis?

A thread on how to write your own package in R.

#rstats #coding #datascience
Packages are collections of code and data, combined together into one tidy bundle.

This makes them great for analysis projects - not just for downloading packages like dplyr or data.table.

Writing a package might sound intimidating but I'll show you how.

1/20
Let's say you have code, data and outputs for some analysis in a single folder.

If you aren't working inside an R project already click File > New Project > Existing Directory > Browse (select the folder) > Create Project.

2/20
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(