Well, it's time for me to close out my week hosting @iamscicomm. Thanks for all of your questions and comments! I had so much fun chatting with folks from different fields and walks of life!
I'll leave you with my 5 general guidelines for creating more effective dataviz.
1. Show the Data.
Your reader can only grasp your point, argument, or story if they see the data. This doesn’t mean that all the data must be shown, but it does mean that you should highlight the values that are important to your argument.
2. Reduce the Clutter.
The use of unnecessary visual elements distracts your reader from the central data and clutters the page. Reduce/eliminate heavy tick marks, gridlines, textured gradients, too much text and labels. Focus on the data.
I close out my week hosting the @iamscicomm account by sharing just a select few examples of #dataviz that don't follow the rules or templates or tried-and-true approaches. But they are beautiful and engaging and enlightening.
As you go forth and create your visualizations, continue to explore. Draw inspiration from all around you and from the amazing work these and other creators are generating.
Before we get to the ten guidelines, recognize that just like in graphs and charts, there are a lot of pieces to tables. And, just like graphs and charts, we can control the look and design of all of these elements.
Rule 1. Offset the Heads from the Body
Make your column titles clear. Try using boldface type or lines to offset them from the numbers and text in the body of the table.
We kick off today's subject of #dataviz for part-to-whole relationships and qualitative data with some of my favorite fun pie charts. I did not originally create these, and the original creators are lost to history.
Flow maps are another kind of way to visualize your data. Maybe the most famous flow map is this one from Charles Joseph Minard in 1869. Tufte always touts this one as being the "best statistical chart ever made".
A quick 🧵 on the Minard map.
The famous Minard map shows 6 data values in a single view: 1. Number of troops (line thickness) 2. Distance traveled (scale) 3. Temperature (line at bottom) 4. Time (line at bottom) 5. Direction of travel (color) 6. Geography (cities, etc.)
But Tufte left out the fact that the Minard Napolean map was only one panel in a full spread. It also included the lesser-known map of Hannibal’s 218 BC march through the Alps to Rome. (This image from Ecole nationale des ponts et chaussées, which I include in my book.)
So far today, we've talked about graphs that show distributions and uncertainty that in some ways *summarize* the data. But what about showing specific data points in the data set? There are a few:
-Strip charts 🥓
-Beeswarm charts 🐝
-Raincloud plots 🌧
In the strip plot, the data points are plotted along a single horizontal or vertical axis. You might get some overlapping here, but you can use color transparency to show the individual points, if that's important. Here are a few from the NYT.
The thing about the strip chart (sometimes called the stripe chart) is that you can use dots, points, or lines. And sometimes the important thing is to just your reader know that there are lots of points in some part of the distribution.
There are (at least) two graphs that can be used to show distributions in your data that don't show specific percentile values.
-The violin chart 🎻uses kernel density estimates to generate a shape of the entire distribution. Here's one I made of earnings in industries.
The ridgeline plot is a series of histograms or density plots shown for different groups aligned along the same horizontal axis and presented with a slight overlap along the vertical axis. It's kind of a 'small multiples' histogram. This one from @hrbrmstr.
One of the most common ways to visualize the distribution in your data is the histogram. It's basically a bar chart where the data are divided into bins. I like this one from @JustinWolfers about finishing times in the NY Marathon. | nytimes.com/2014/04/23/ups…
Again, while I think many people don't quite understand concepts like variance and percentiles, the histogram resembles a bar chart, so it may be a graph type folks can easily understand. You can also find histograms in Google when you look for restaurants or stores.
Let's do one more: the connected scatterplot. The CS shows two time series simultaneously—one each along horizontal and vertical axes—and are connected by a line to show relationships of the points over time. It's a great possible alternative to the dreaded dual axis chart.
In general, I find that the connected scatterplot is 🎇awesome🎇 about 2/10 times--the rest of the time, I either get a straight line (e.g., spending and participation) or a hairball mess. But, there are lovely cases where it just works out.
One variation on the area chart is the streamgraph. Bear with me here, as it's kind of a weird looking graph. A streamgraph stacks the data series, but the central horizontal axis does not necessarily signal a zero value. Instead, data can be positive on both sides of the axis.
Another way to consider plotting changes over time is to not just use a single line in the entire graph and break things up over multiple graphs. There are a couple of options here.
1. Sparklines. Named by Edward Tufte, sparkles are “small intense, simple, word-sized graphics with typographic resolution.” They are typically embedded within tables, which can help make tables easier to read. Here's a basic example: grapecity.com/blogs/visualiz…
2. Cycle graphs. Cycle graphs typically compare small units of time, such as weeks or months, across a multiyear
time frame. They were introduced by Bell Labs in a 29182 paper. Here's an example from @kennelliott in @PostGraphics.
We start with the LINE chart. Why? Because, just like the bar chart, it's familiar and easy to create. As simple as they are, there are a number of considerations to take into account, some of which are aesthetic, and some of which are substantive.
1. There is no limit to the number of lines you plot. This @FiveThirtyEight chart has 3,282 lines, but it's still easy to read, right? The key is not to worry about the sheer amount of data, but instead about the purpose of the graph and how you can direct your reader's focus.
2. You don't need to start the vertical axis at zero. There is still some debate about this, but the basic rule-of-thumb is that the axis dimensions in a line chart depend on the data and your communication goal.
This podcast series aims to create visible role models for the younger generation and guests have shared some great stories. One of the most memorable stories was with @TapokaM: buzzsprout.com/809081/4312232…
To talk about #scicomm in Africa I need to bring it back to me.
I only found out about this field late 2019 in the end of MSc year right here on Twitter= social media.
Similar to many of you in this poll. 1/n
Despite the recent dev in scientific output from Africa public understanding of science researchers in many parts of the continent remain low. This has been so obvious during this pandemic with mass misinformation flying all over social media
an opinion piece by Karikarl 2016 links this to the following: 1. Lack of awareness 2. Literacy rates 3. Multiplicity of language(this is so NB!!)
Yesterday I reminded u Africa has 54 countries with different languages. Which can make #scicomm difficult but not impossible.
I asked which field you think contributes the most in terms of scientific research output by #AfricansInSTEM?
Reveal time 😁
This poll is right it's actually a close link btwn life sciences and earth & Env science!
I hope you didn't cheat 👀😅
This comes from a report by .The Next Generation of Scientists in Africa(2018) of a 4year study. Authors surveyed 5,700 African researchers 2016- 2017 & analysed papers listed in Web of Science that had African authors & were published btwn 2005- 2016.
Here's the list. 2/n
The question is why these topics?
Well @Aliens68 pointed out something important :FUNDING
big grants tend to be in fields favoured by foreign funders who favor topics such as agriculture and health sciences.
Here's a visual of some major funders of #AfricansInSTEM. 3/n
Let's talk about scientific & #AfricansInSTEM.
Quick disclaimer: this info I report is from research from books, sites etc. I don't claim to know it all. So if I am wrong I am very willing to correct my mistakes & learn. Let's converse. That's exactly what today is about
Here's some stats for you. According to an article by Elsiver (bitly.com) Africa accounts for <1 percent% of global research output. Despite having 16.72% of global population.
I love this visual made by @Tasia1409 which gives us a visual understanding.
When some people think of Africa, they think it's a monolith. There are 54 countries. Let's see where this 1% comes from. In a @nature country research outputs report from
1 De'19 - 30 Nov'20 shows South Africa is leading. Here's the top 20.
🔗 :shorturl.at/hjtHK 3/n