I'll start with one of my own bc I assume that you like my style (otherwise why ask me?)
Beware though: YARDS is a graduate-level course that I taught for math students w/ a bit of programming experience.
May 31, 2023 • 8 tweets • 4 min read
Data cleaning is tedious.
But it's much easier with the {janitor} package. Especially if you work with Excel files.
Here are 5 underrated features from {janitor}. #rstats
1 // Create clean names
This is absolutely the best function. It transforms column names such that they are easier to use for programming.
Left: Bad for programming
Right: Good for programming
May 19, 2023 • 26 tweets • 10 min read
Ever heard of logistic regression? Or Poisson regression? Both are generalized linear models (GLMs).
They're versatile statistical models. And by now, they've probably been reframed as super hot #MachineLearning.
Brush up on their math with this thread. #rstats
Let's start with logistic regression. Assume you want to classify a penguin as male or female based on its
* weight,
* species and
* bill length
Better yet, let's make this specific. Here's a dataviz for this exact scenario. It is based on the {palmerpenguins} data set.
May 12, 2023 • 24 tweets • 11 min read
The best way to learn data analysis is to actually practice it.
Each week, the #tidyTuesday challenge gives you plenty of opportunity for this.
Don't know how to get started with the challenge? In case you missed it, I've put together an #rstats guide in January.
First, get the data.
Tired of lackluster visualizations that don't tell you anything?
Discover how storytelling and nuanced color use can
- transform your bar charts.
- inform readers on key insights & actions
Here's a step-by-step guide (with full code at the end). #rstats
Here's our starting point.
Note that this tutorial is a ggplot2 recreation of
(And once you've mastered the technique you can enhance this visual with advanced stats beyond comparing error rates to average.)
Mar 29, 2023 • 9 tweets • 5 min read
Data visualization doesn't have to be complicated. 🤯
In fact, ggplot makes it dead-simple to implement some of the most effective dataviz principles.
Here are six dataviz principles that are so easy that any beginner’s course should teach them. #rstats
1 // Make sure your labels are legible
This one is super easy to fix. Any beginner can do it.
Img 1: Way too small fonts & unclear labels
Img 2: Fixed with labs() and theme_gray(base_size = 20)
Img 3: Full code
Mar 25, 2023 • 13 tweets • 7 min read
Manually sifting through mountains of data is annoying. 🥱
But with the point & click interface of analytics dashboards, data exploration is more fun.
And building a dashboard is simple too, especially with R & Shiny. Here's how to get started now. #rstats
1 // Data
First, you need data.
It's always fun to work with your personal data, so I will use my the last three months of my Twitter analytics data. You can download yours at analytics.twitter.com
Mar 11, 2023 • 14 tweets • 6 min read
Ever found yourself stuck trying to visualize data that's only available as PDF?
I faced this exact issue when recreating an interactive plot on the democracy index.
Here's how I circumvented that issue with #rstats. (plus code at the end)
0 // Find the data
This will give you access to a PDF that contains multiple pages of data.
Mar 4, 2023 • 9 tweets • 4 min read
Text manipulation is an essential data cleaning skill. Often, this is step 1 before you can get any work done.
But with the right functions you can speed up that process. Here's how. #rstats
(With full code at the end)
0 // Get the data
First, we need an example data set. Here's one from TidyTuesday.
Take a look at the company names. They contain words like "Inc." and "Corporation". That's not something I'd use in a dataviz (too much clutter).
So, let us fix the names.
Feb 26, 2023 • 26 tweets • 13 min read
Over the past year, I've shared hundreds of dataviz tricks.
This thread compiles the best tricks and will help you to
A) build clear visualizations
B) choose better chart types
C) use colors much more efficiently
D) create a dynamic experience with interactive elements #rstats
A1 // Two things every {ggplot2} course should teach:
1️⃣ Use proper labels and create a title with labs()
2️⃣ Increase the text size with theme_grey(base_size = ...)
Just two lines of code. But a considerable amount of respect for your audience (which has to read your graph).
Feb 9, 2023 • 11 tweets • 5 min read
Sometimes people ask me if I can do one-on-one R tutoring.
Sure I can. But then my hourly rate applies. And there are many amazing *free* resources. Want to try them first?
Here are a few that I recommend. #rstats
1 // Yet Again: R + Data Science