Thread by @WomenInStat on Thread Reader App

I'm curious about the most tweeted about #rstats packages in the past ~week, let's explore this using R! 🧵

I'm going to use these packages:
*️⃣ tidyverse
*️⃣ rtweet
*️⃣ rvest
*️⃣ tidytext

you can install them with:
install.packages(c("tidyverse", "rtweet", "rvest", "tidytext"))

First, let's get a vector of all #rstats packages on CRAN!

library(rvest)

r_pkgs <- read_html('cloud.r-project.org/web/packages/a…') %>%
html_nodes('td:nth-child(1)') %>%
html_text()

Then let's pull in all tweets in the past week or so that use the #rstats hashtag

library(rtweet)

df <- search_tweets(q = "#rstats", n = 7000, include_rts = FALSE)

Alright! If you're following along live, we've got 4,948 tweets (you may get a different number if you run this in a few minutes! In fact, I expect you'd at least pick up one more tweet: this one! #rstats)

🧹 Let's tidy this up! I'm going to build up step-by-step here.

First let's unnest the text into words.

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text)

Now let's remove any "stop words" this will be things like "not", "do" etc.

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words)

Now I'm going to keep only the "words" that are found in that character vector of packages we created before (I called it r_pkgs)

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(word %in% r_pkgs)

Let's count up the words and keep the top 5 to plot.

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(word %in% r_pkgs) %>%
count(word) %>%
slice_max(n, n = 5)

Ok, final tidying step, let's reorder so it plots nicely

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(word %in% r_pkgs) %>%
count(word) %>%
slice_max(n, n = 5) %>%
mutate(word = reorder(word, n))

Let's plot it!

ggplot(pkgs, aes(n, word)) +
geom_col(fill = "light blue") +
labs(y = "package")

And there we have it! Top 5 recently tweeted about #rstats packages:

🥇 tensorflow
🥈 tidyverse
🥉 ggplot2
🏅 shiny
🏅 gt

Thanks for following along!

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll