I'm curious about the most tweeted about #rstats packages in the past ~week, let's explore this using R! 🧵

I'm going to use these packages:
*️⃣ tidyverse
*️⃣ rtweet
*️⃣ rvest
*️⃣ tidytext

you can install them with:
install.packages(c("tidyverse", "rtweet", "rvest", "tidytext"))
First, let's get a vector of all #rstats packages on CRAN!

library(rvest)

r_pkgs <- read_html('cloud.r-project.org/web/packages/a…') %>%
  html_nodes('td:nth-child(1)') %>%
  html_text() library(rvest)  r_pkgs <- read_html('https://cloud.r-project
Then let's pull in all tweets in the past week or so that use the #rstats hashtag

library(rtweet)

df <- search_tweets(q = "#rstats", n = 7000, include_rts = FALSE) library(rtweet)  df <- search_tweets(q = "#rstats"
Alright! If you're following along live, we've got 4,948 tweets (you may get a different number if you run this in a few minutes! In fact, I expect you'd at least pick up one more tweet: this one! #rstats)
🧹 Let's tidy this up! I'm going to build up step-by-step here.

First let's unnest the text into words.

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) library(tidyverse) library(tidytext)   pkgs <- df %>%   unne
Now let's remove any "stop words" this will be things like "not", "do" etc.

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) library(tidyverse) library(tidytext)   pkgs <- df %>%   unne
Now I'm going to keep only the "words" that are found in that character vector of packages we created before (I called it r_pkgs)

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(word %in% r_pkgs) library(tidyverse) library(tidytext)   pkgs <- df %>%   unne
Let's count up the words and keep the top 5 to plot.

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(word %in% r_pkgs) %>%
count(word) %>%
slice_max(n, n = 5) library(tidyverse) library(tidytext)   pkgs <- df %>%   unne
Ok, final tidying step, let's reorder so it plots nicely

library(tidyverse)
library(tidytext)

pkgs <- df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
filter(word %in% r_pkgs) %>%
count(word) %>%
slice_max(n, n = 5) %>%
mutate(word = reorder(word, n)) library(tidyverse) library(tidytext)   pkgs <- df %>%   unne
Let's plot it!

ggplot(pkgs, aes(n, word)) +
geom_col(fill = "light blue") +
labs(y = "package") ggplot(pkgs, aes(n, word)) +    geom_col(fill = "light
And there we have it! Top 5 recently tweeted about #rstats packages:

🥇 tensorflow
🥈 tidyverse
🥉 ggplot2
🏅 shiny
🏅 gt

Thanks for following along!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Women in Statistics and Data Science

Women in Statistics and Data Science Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WomenInStat

Dec 14, 2021
You know how excited @daniela_witten gets about SVD? I have about the same thing with kernels. Except that I'm not sure I explain them as well as she does SVD. Still, you're getting a thread on kernels!
Maybe one way of putting it is that kernels are dot products on steroid. The dot product is already pretty cool.

1) It's easy to compute and you learn about it in high school math (at least I did, who knows what kids learn in high school now).
Take two p-dimensional vectors x = (x1, x2, ..., xp) and y = (y1, y2, ..., yp), their dot product <x|y> is simply the sum of the product of their coordinates:
<x|y> = x1 y1 + x2 y2 + ... + xp yp.
Read 31 tweets
Dec 14, 2021
Good morning everybody! Let's talk a bit about how I came to develop statistical / machine learning tools for genomics, healthcare and drug discovery.
I trained as an engineer at @IMTAtlantique, with a specialization in computer science. I didn't really enjoy statistics and graduated in 2005, back when AI belonged to scifi and nobody knew what machine learning was.
@IMTAtlantique What really interested me was bioinformatics - the idea that my training in maths and computer science could be put to use to help solve problems from the life sciences was very appealing! So I jumped at the opportunity to intern in a lab that was doing just that.
Read 14 tweets
Dec 13, 2021
OK, so a bit of background about me: I'm French (and tweeting from Paris), and I'm currently an associate professor at an engineering school called @MINES_ParisTech.
@MINES_ParisTech The research group I'm in (CBIO) has a partnership with @institut_curie, which is a cancer research institute. CBIO has four PIs, working on various topics related to, you've guessed it, statistics / machine learning & cancer.
@MINES_ParisTech @institut_curie My plan for the week is to talk more about my career path, my research topics, and my love of kernels. Of course I'll also talk about what we do at @WiMLDS_Paris, about open/reproducible science, and about teaching machine learning!
Read 5 tweets
Dec 4, 2021
I have organized multiple conferences over the years.

Tips to conference organizers to support women at your meeting

1- Actively consider gender and career stage balance in speakers.
2- Women and minorities may take a longer route to success, try to avoid ageist selection.
3- Provide lactation rooms (with equipment & milk storage). Pumps are heavy and a pain to carry around a meeting. The room should be close-by not a long walk away
4.- Small babies are welcome. Check there is a changing table accessible to dads & mums.
5.- Parents of young children are often postdocs, junior faculty who need and are grateful for childcare and/or travel scholarships.

6- Go Hybrid. Live stream & record talks. Its great if one is stuck in a lactation room, or watching remotely
Read 6 tweets
Dec 3, 2021
@Bioconductor provides genome annotation for thousands of species and its packages are used in almost every biological discipline including
Immunology
Oncology
Evolution and Phylogenetics
cheminformatics
comparative genomics
epigenetics
pharamacogenomics
systems biology
etc
@Bioconductor @Bioconductor is a repository for R packages. There are daily builds & checks of packages.

All packages submitted to Bioconductor are reviewed to check they are documented, have tutorials (vignettes) and pass R CMD Check and BiocCheck

Reviews are friendly and help developers
@Bioconductor @bioconductor is a community

The core team with the community create standard class structures for data. Developers create methods that use these, creating a connected framework were packages work together and provide entire analysis workflows

The Books bioconductor.org/books
Read 7 tweets
Dec 3, 2021
Lets chat @Bioconductor A thread on
- what is @Bioconductor
- Why use @Bioconductor
- How to get started and contribute to @Bioconductor
@Bioconductor @Bioconductor is an open source open development project in the #RStats language for genomics

It has approx 100K distinct downloads every month and is used globally

bioconductor.org/packages/stats…
@Bioconductor @Bioconductor updates are released twice a year in October and April

The current release @Bioconductor 3.14, consists of 2083 #RStats packages, 408 experiment data packages, 904 annotation packages, 29 workflows and 8 books.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(