Neil Currie Profile picture
Data scientist | Freelancer

Dec 16, 2022, 16 tweets

How to create dumbbell plots with ggplot2

Impress clients, make the complex simple and up your data vis game.

#rstats #datavis #datascience

Dumbbell plots are a great way of communicating information that can be grouped then split into two points.

I used one for TidyTuesday earlier this year:

github.com/neilcuz/tidytu…

But making them can be a little tricky.

I can show you how step-by-step.

Dumbbell plots can be created using 2 ggplot2 building blocks: geom_point and geom_line.

So let's start building from the ground up with some examples in each before combining to make a dumbbell plot.

1. geom_point

Plots in ggplot2 start with the ggplot function. We then add + successive layers with geometry functions before customising the appearance further.

geom_point is used to create scatter plots and other point charts.

Notice I placed the x and y variables inside the aes function.

I found aesthetics hard to understand at first - you read lots of talk of mapping variables and visual cues which didn't make sense to me then.

But I think there is an easier way to understand them.

Anything on a chart (a line, point, colour, shape, size etc) which changes with the data, place inside aes().

Anything which doesn't vary with data (a colour, shape, size, etc), place outside aes().

It's easier to see with an example.

In plot 1 colour is placed inside aes so the colour varies with the data. Here it corresponds to the group1 value since colour = group1.

In the second plot colour is placed outside aes. The colour is red regardless of the data values.

2. geom_line

geom_line works similarly to geom_point.

You can add 1 or more lines and easily combine with geom_point.

The dummy celsius dataset was in what is known as wide format.

We can put this into long format using pivot_longer from tidyr. Long format will be important for our dumbbell plot. There is now a row for high and for low for each month.

From long format we can replicate the last plot with similar code.

Notice to get the correct colours I have created a named vector called chart colours and passed it to scale_colour_manual. This approach even throws in a legend.

Now we are ready for the dumbbell plot.

3. Dumbbell plots

We know about geom_point, geom_line, aes and long format. We can combine these to make a dumbbell plot.

Let's start with the points.

We will group our data by high and low temperature with a point for the value. The y axis will have the months.

The first stab at it is not bad but the months go from December to January.

We can sort this by reordering the month name factor level.

Next we add the lines.

Here we use the group argument and place it inside aes. This will group the lines by month_name. The other aes arguments are the same.

The first attempt is ok but the lines overlap the points. Not to worry, we can swap them.

Finally we can make some tweaks to make the chart look better.

We could do plenty more but the objective here was to learn about dumbbell plot, not the ins and outs of good design.

To recap:

1. Dumbbell plots are a great way to communicate insights.

2. These can be created with ggplot2 by combining geom_line and geom_point.

3. You need to know a bit about aesthetics and long format data.

Thanks for reading, if you liked this thread follow me @neilgcurrie for R and data tweets.

Website:

shoogle.co

YouTube

youtube.com/@shoogle

Code:

github.com/neilcuz/thread…

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling