messy, realistic datasets are **extremely** useful for learning, teaching, exploring, and practicing your #rstats skills. so where do you find datasets like this?!
here are some practical tips you can use to find messy datasets! 🧵
you may be familiar with the #TidyTuesday project as a fantastic community effort to practice data visualization.
BUT did you know it's an incredible resource for finding datasets on topics ranging from pollution to board games to tech adoption?
did you know some of the weekly #TidyTuesday dataset contributions actually include cleaning scripts? that means you can practice either on the original/raw data OR the slightly-transformed ready-to-visualize data.
#2. if you are an undergraduate or graduate student, your institution likely has a data librarian! you can reach out and ask them about datasets in your topics of choice!
when i worked as a data librarian in the social sciences, I LOVED when a student asked for help finding data
#3. open data portals! you can likely find governmental data portals wherever you are in the world at some geographic level! in the US, i often gravitate to cities (because i find them fascinating and complex). take a look at food inspection data in NYC!
#4. site-limited searching on google. this is honestly a general-use superpower for finding what you need online use the "site:" parameter like "climate data .csv site:github.com".
I particularly like trawling through github in this way to find unexpected data
and re: github, are you familiar with Awesome Lists? here is the awesomelist of Public Datasets on github - truly a fantastic place to do some digging: github.com/awesomedata/aw…
one more note on #TidyTuesday - if you want some inspiration about how to make decisions about data cleaning for these datasets, there's a whole world of TidyTuesday videos/streams!
here's a very sweet example from the talented @lisalendway
"Project MOSAIC is a community of educators working to develop a new way to introduce mathematics, statistics, computation and modeling to students in colleges and universities."
One great thing about working in #DataScience is that some of the most satisfactory work involves creating images to express ideas or results 💡
This applies to R packages, that can be made expressive with fantastic HEX logos like the one below, by @allison_horst... 1/5
@allison_horst Creating a hex logo may seem difficult at first, but there are many tools to help 💁♀️ 🛠️
The R package hexSticker inserts an image of your choice in a hexagon shape, lets you add text and choose the colors for background and border: 📷 + 🔡 + 🔶 2/5
@allison_horst For the package that @CatalinaMMedina, @MineDogucu and I (@FedeZoeRicci) created this year, I made the drawing on a tablet and used a free online software to cut it into an hexagon shape and add a border: 3/5
After that inspiring panel of women in Data Science, I want to give a spotlight and appreciation round to all those #RLadies that are active on #rspatial and that have definitely inspired me to also try and be part of the community 🤩
I invite you all to give a shoutout to those amazing #RSpatialLadies that have crossed your path!
Anyone wants to get started or has used #rspatial and learned from tutorials/courses using packages such as {rgdal} {rgeos} {sp} {raster}? Are you getting used to them already? You might want to reconsider 🤔
🧵 1/n
For the past few months the R Spatial community of developers and active users has been dealing with the news of the retirement of {rgdal} & {rgeos}
🧵 2/n
For {sp} is a similar story, although still getting maintenance, a migration to {sf} should be mostly preferred for any new developments and implementations
🧵 3/
Hi all 👋 today I would like to talk about #CV tips. Did you know you can create your CV in #rstats? There are so many cool 📦 out there!
I personally use {vitae} with a good range of eye-catching templates to choose from, I highly recommend it!
What I like the most about building my CV in R is that I can organize everything in an R-project, I push to GitHub which gives me track changes and I can use the great advantages of #rmarkdown and #latex. Here is the repo of my #vitae CV github.com/loreabad6/R-CV
I had to modify a bit the .csl file to adapt certain details to my taste, and I included a 🗺️ of my journey which has had some nice feedback, as it serves as a visual presentation card 💃