Albert Rapp Profile picture
Jan 15 25 tweets 11 min read
The best way to learn data analysis is to actually practice it.

Each week, the #tidyTuesday challenge gives you plenty of opportunity for this.

Don't know how to get started with the challenge? Here's an #rstats guide using this week's data.
First, get the data.

Head over to the tidyTuesday's GitHub repo at github.com/rfordatascienc…

Just copy the code from the "Get the data" section. Image
Next, I suggest that you use the clean_names() function from the {janitor} package.

This will format the column names of your data set so that it's easier to work with.

Huge time saver! Image
Now, take a look at the column names of your cleaned data set.

Together with the data dictionary from the GitHub repo, you can find out what the data means. Image
It looks like there are many 'fed_in' variable names in the 'site_data' data set.
Let's take a look at all of them.

select() + Tidyselect helpers will target the right columns Image
This looks weird.
It's only zeroes, ones and NAs.
Probably a true/false kind of thing.

Let's bring more columns into this.
There's `loc_id` and `proj_period_id` as well. Image
This is starting to make sense.

Each feeding site has a unique location and a project id that contains what looks like a year.

Let's check how many project IDs there are. Image
All project IDs contain the same prefix.

Let's remove it and transform the character vector into an actual numeric vector.

`parse_number()` can take care of that. Image
Next, we're going to take care of missing values.

Let's have a look how many missing values there are.

Here are two ways to do that:
1️⃣ summarise() + across()
2️⃣ for-loop ImageImage
There is missing data. Let’s filter those that have missing data in any of the month columns.

The `fed_yr_round` column can be filled by us later on.

Once again, here are two possible ways:
1️⃣ pmap() from {purrr}
2️⃣ rowSums() (treating TRUE as 1) ImageImage
Now, let us bring our data into a tidy format.

That’s what `pivot_longer()` will do for us. Image
Next, we’re able to do a little bit of counting.

This is always an easy but valuable thing to do. Just throw count() at the data to see what's (and how much) is in it. Image
Using these counts we can check how many sites there are in each year.

Looks like overall the number of sites increased over the years.

This plot was just something we did for ourselves. No need to customize it further. ImageImage
Finally, let’s have a look at how many feeding sites feed all-year.

Maybe over time more or maybe less bird sites are active every month.

As it happens, it looks like there is a trend that more and more bird sites are active every month. Let’s make this viz a bit prettier. ImageImage
First, let’s apply `theme_minimal()` and make the bars wider. Also, black borders for the bars could be nice. ImageImage
Second, add labels. Add a descriptive title and don’t forget to put your Twitter handle into the caption. ImageImage
Third, let us format the y-axis as percent. ImageImage
Fourth, pick better colors manually. ImageImage
Fifth, get rid of the extra spacing surrounding the bars. ImageImage
Finally, move the legend and title. ImageImage
There’s lots more one can do with the data or the plot. But this is probably okay as a start.

At this point, you can share your plot on Twitter using the #tidyTuesday hashtag.
If you share your plot, think about sharing your code as well.

Common practices for sharing the code:
- A dedicated tidyTuesday repo on Github.
OR
- Upload the code at gist.github.com.

This thread's code is available at github.com/AlbertRapp/Pub…
I hope this helps you to get started with the tidyTuesday challenge.

If you want more help, check out the R screencasts rscreencasts.com.

They're a great resource on learning data wrangling using tidyTuesday data sets.
That's a wrap. I hope you've enjoyed this thread.

If you want to see more content like this, follow @rappa753.

See you next time and don't forget to hit the like button at the start of this thread 👋
If you liked this post, you may enjoy my 3-minute newsletter too.

Every week, my newsletter shares insights on
- dataviz,
- Shiny web apps
- stats

Reading time: 3 minutes or less

You can join at
alberts-newsletter.beehiiv.com/subscribe

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Albert Rapp

Albert Rapp Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rappa753

Jan 6
Icons are a fast and efficient way to convey meaning. Classical examples are

Plus or minus signs ➕ ➖
Checkmarks ✅ ☑️
Brand logos ™️
Emojis ☺️

8 dead-simple steps to include these kind of icons in any plot with #rstats 🧵
First, load the necessary packages

1️⃣ Load {showtext}
2️⃣ Load {ggtext}

Then, activate showtext:

3️⃣ Call showtext_auto()
Next, you'll need an icon font. Here we'll use fontawesome.

4️⃣ Download free fonts from fontawesome.com/download
5️⃣ Import fonts via font_add()
Read 7 tweets
Dec 30, 2022
As an #rstats user, the main reason to use C++ is to speed up long calculations.

Even though C++ can feel daunting, it's actually not that hard with {Rcpp}.

Here's the story of
1️⃣ how I learned {Rcpp} in an hour and
2️⃣ saved many hours of computation time
I have this R function that estimates the variance of sample means. It wasn't hard to implement but it's a serious bottleneck.

The function involves splitting a time series into blocks and computing lots of averages with these blocks.
To refactor this function into C++, I've had to learn four things.

1. Creating C++-files
2. Function and Variable declaration
3. For-loops
4. Vector access

Most of it, I could easily copy from adv-r.had.co.nz/Rcpp.html. Let me give you the short tour.
Read 15 tweets
Dec 28, 2022
Data cleaning is tedious.

But it's much easier with the {janitor} package. Especially if you work with Excel files.

Here are 5 underrated features from {janitor}. #rstats
1 // Create clean names

This is absolutely the best function. It transforms column names such that they are easier to use for programming.

Left: Bad for programming
Right: Good for programming
2 // Remove empty or constant cells from Excel files

Excel files can be messy to read in R. Lots of weird column names and empty cells.

{janitor} takes care of that for us.
Read 8 tweets
Dec 26, 2022
Want to learn some machine learning over the holidays?

Generalized linear models like
1️⃣ logistic regression
2️⃣ Poisson regressions
are a good place to start.

Here's how they work in #rstats.
Need to brush up on the math behind these models before we get started?

My most popular thread may help you.
One more hint before we start:

All of my code examples can be copied from my blog post.

The data that I use here comes from {palmerpenguins}. And we're going to classify a penguin's sex based on its weight, species and bill length. 🐧 🐧

albert-rapp.de/posts/14_glms/…
Read 16 tweets
Dec 14, 2022
Analytics dashboards are a popular way to explore data or to communicate insights.

So it's good to know how to build one.

Here's how you do that with R in no time. 🧵 #rstats #shiny ImageImage
1 // Data

First, you need data.

It's always fun to work with your personal data, so I will use my the last three months of my Twitter analytics data. You can download yours at analytics.twitter.com

You could also download the public metrics with {rtweet} instead. Image
2 // Data cleaning

Next, do a bit of data cleaning. In my case, I just have to clean the column names and select a few columns.

{janitor} and {dplyr} make that easy. Image
Read 16 tweets
Nov 4, 2022
I used to think tables are boring.

But they can be beautiful & engaging.

Here's a nice example from @infobeautiful.

It uses many eye-catching elements.

But you don't need catchy visuals to create a great table. Here are a couple of guidelines that will help you 🧵 A huge table describing what needs to be done to the US to z
Let's start with a not so great table and improve it.

Here's a table I would have created just a few months ago.

Not so sexy, right? Let's clean that up. An ugly table describing the number of penguins of different
1. Avoid vertical lines

The above table uses waaaay to many grid lines.

Without vertical lines, the table will look less cramped.

Have a look for yourself. Same table as before but with no vertical grid lines. Table
Read 22 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(