Let's start with session 1:
"Introduction to #rstats and #rstudio" ®️
The fundamentals of R includes:
* values
* assignments and objects
* functions
* data types
* unknown values
* vectors
* factors
* packages
* tabular data
* data generation
* data import
We also covered, among other topics, naming conventions, coercion, name conflicts, ...
... tibbles as modern implementation of data frames, retrieving basic summaries of data sets, potential problems, and discussed resources to find help.
Ay, just realize the crappy image resolution... Sorry, going to prepare better ones for the other sessions.
Time for session 2:
"Data Wrangling with the {tidyverse}*"
This time with slides in better quality.
* I know it's a bit too broad but as we use multiple packages such as dplyr, tidyr, forcats, and stringr (and strictly speaking tibble as well) I went for this session name.
Some analysis and #dataviz might be possible without (re)shaping and/or summarizing your data—especially also thanks to #ggplot2's powerful stat functionality—but often we need to prepare our data for the next steps. You can do it in #Excel but we, of course, use #rstats
Of course, we start with THE main package for data wrangling in the #tidyverse collection: the #dplyr 📦 and its main verbs
(Credit to @allison_horst for her lovely illustrations that are featured across all sessions 🙌)
I always share the equivalent #baseR code (not everyone loves the #tidyverse 😱) and show the basic and a bit more advanced usage of the main verbs--and of course group_by and how it gives you SUPERPOWER!! 🦹♀️🦸♂️
How to bring it all together? Pipe it!
In the following, a few more functions (and #tidyverse packages) that help when cleaning data (feel free to share your favorites, those are the ones I am using regularly)
#tidyr: pivoting is though but so important and powerful
#forcats: suddenly working with factors became one of my favorite tasks in R! 🤯
And it's so important in combination with #ggplot2 as well:
#stringr: well, working with strings. Consistent and simple (well, except nasty #regex formulas)
#lubridate: working with dates became so simple as well as I've never been a fan of POSIXct/lt and Co.
@rstudio The session pages contain not only the slides but
🔵 hands-on #rstats codes
🔵 recap notes
🔵 exercises incl.
🔵 prepared scripts, either as #quarto or #rmarkdown
🔵 step-by-step solutions
📊🧵 Collection of tweets featuring open-access materials that I have shared over the last years:
Talks, seminars, blog posts, hands-on notebooks, codes, and more! #rstats#ggplot2#tidyverse#dataviz 🧙♂️
The tutorial now contains 188 plots and is generated with ~3000 lines of code.
Added topics (1/5):
- several alternative ways to solve things
- short explanation of geoms and theme in the intro
- more on theme elements
- in general a bit more text + explanations
- highlighting difference `scale_x|y_continuous()` vs `coord_cartesian(x|ylim)`