Let's do one more: the connected scatterplot. The CS shows two time series simultaneously—one each along horizontal and vertical axes—and are connected by a line to show relationships of the points over time. It's a great possible alternative to the dreaded dual axis chart.
In general, I find that the connected scatterplot is 🎇awesome🎇 about 2/10 times--the rest of the time, I either get a straight line (e.g., spending and participation) or a hairball mess. But, there are lovely cases where it just works out.
So far today, we've talked about graphs that show distributions and uncertainty that in some ways *summarize* the data. But what about showing specific data points in the data set? There are a few:
-Strip charts 🥓
-Beeswarm charts 🐝
-Wheatplots 🌾
-Raincloud plots 🌧
In the strip plot, the data points are plotted along a single horizontal or vertical axis. You might get some overlapping here, but you can use color transparency to show the individual points, if that's important. Here are a few from the NYT.
The thing about the strip chart (sometimes called the stripe chart) is that you can use dots, points, or lines. And sometimes the important thing is to just your reader know that there are lots of points in some part of the distribution.
There are (at least) two graphs that can be used to show distributions in your data that don't show specific percentile values.
-The violin chart 🎻uses kernel density estimates to generate a shape of the entire distribution. Here's one I made of earnings in industries.
The ridgeline plot is a series of histograms or density plots shown for different groups aligned along the same horizontal axis and presented with a slight overlap along the vertical axis. It's kind of a 'small multiples' histogram. This one from @hrbrmstr.
For those who are interested, here's a list of some papers relating to uncertainty in data visualization:
S. Belia, F. Fidler, J. Williams, and G. Cumming, “Researchers misunderstand confidence intervals and standard error bars.”, Psychological methods, vol. 10, no. 4, p. 389, 2005.
Brodlie K, Osorio RA, Lopes A. A review of uncertainty in data visualization. Expanding the frontiers of visual analytics and visualization. 2012:81-109.
One of the most common ways to visualize the distribution in your data is the histogram. It's basically a bar chart where the data are divided into bins. I like this one from @JustinWolfers about finishing times in the NY Marathon. | nytimes.com/2014/04/23/ups…
Again, while I think many people don't quite understand concepts like variance and percentiles, the histogram resembles a bar chart, so it may be a graph type folks can easily understand. You can also find histograms in Google when you look for restaurants or stores.
One variation on the area chart is the streamgraph. Bear with me here, as it's kind of a weird looking graph. A streamgraph stacks the data series, but the central horizontal axis does not necessarily signal a zero value. Instead, data can be positive on both sides of the axis.
Another way to consider plotting changes over time is to not just use a single line in the entire graph and break things up over multiple graphs. There are a couple of options here.
1. Sparklines. Named by Edward Tufte, sparkles are “small intense, simple, word-sized graphics with typographic resolution.” They are typically embedded within tables, which can help make tables easier to read. Here's a basic example: grapecity.com/blogs/visualiz…
2. Cycle graphs. Cycle graphs typically compare small units of time, such as weeks or months, across a multiyear
time frame. They were introduced by Bell Labs in a 29182 paper. Here's an example from @kennelliott in @PostGraphics.