messy, realistic datasets are **extremely** useful for learning, teaching, exploring, and practicing your #rstats skills. so where do you find datasets like this?!

here are some practical tips you can use to find messy datasets! 🧵
you may be familiar with the #TidyTuesday project as a fantastic community effort to practice data visualization.

BUT did you know it's an incredible resource for finding datasets on topics ranging from pollution to board games to tech adoption?

github.com/rfordatascienc…
did you know some of the weekly #TidyTuesday dataset contributions actually include cleaning scripts? that means you can practice either on the original/raw data OR the slightly-transformed ready-to-visualize data.

Take a look at SF Rent data for example: github.com/rfordatascienc…
#2. if you are an undergraduate or graduate student, your institution likely has a data librarian! you can reach out and ask them about datasets in your topics of choice!

when i worked as a data librarian in the social sciences, I LOVED when a student asked for help finding data
#3. open data portals! you can likely find governmental data portals wherever you are in the world at some geographic level! in the US, i often gravitate to cities (because i find them fascinating and complex). take a look at food inspection data in NYC!

data.cityofnewyork.us/Health/DOHMH-N…
#4. site-limited searching on google. this is honestly a general-use superpower for finding what you need online use the "site:" parameter like "climate data .csv site:github.com".

I particularly like trawling through github in this way to find unexpected data
and re: github, are you familiar with Awesome Lists? here is the awesomelist of Public Datasets on github - truly a fantastic place to do some digging: github.com/awesomedata/aw…
one more note on #TidyTuesday - if you want some inspiration about how to make decisions about data cleaning for these datasets, there's a whole world of TidyTuesday videos/streams!

here's a very sweet example from the talented @lisalendway
more messy dataset ideas! here are two great suggestions from @alittlestats

* The Data is Plural newsletter - data-is-plural.com

"a weekly newsletter of useful/curious datasets, published by @jsvine "
* The {mosaicData} package: github.com/ProjectMOSAIC/…

"Project MOSAIC is a community of educators working to develop a new way to introduce mathematics, statistics, computation and modeling to students in colleges and universities."

thanks @alittlestats !

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with We are R-Ladies

We are R-Ladies Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @WeAreRLadies

Jul 14
Have you tried using R markdown for a collaborative project?
💻 🧔 👱‍♀️ 🤠 👨‍🦱 👩

Here are some things I have learned 😺 and some things I have loved 😻 about it…

🧵 1/10

(Examples from a paper I co-authored sciencedirect.com/science/articl…)
😺 Dividing the work

Each member can work on a part of the project on a separate Rmd file 🧑‍💻

All files can be incorporated in the “main” Rmd by using the option “child” of a R chunk 👩‍👧‍👧

A comment in the R chunk can be used to remind the team who is assigned to what 🙆‍♀️

2/10
😻 Numeric results: no need to copy/paste!

Numeric summaries of analysis (e.g. sample size) can be computed and printed directly in the text using "inline code" 🧙‍♀️

➡️ Fewer accidents due to copy/paste 🤩

➡️ No having to find&change numbers if data or analysis changes 🥲

3/10
Read 10 tweets
Jul 11
👩‍🎨📊

One great thing about working in #DataScience is that some of the most satisfactory work involves creating images to express ideas or results 💡

This applies to R packages, that can be made expressive with fantastic HEX logos like the one below, by @allison_horst... 1/5 hexagonal logo with the tex...
@allison_horst Creating a hex logo may seem difficult at first, but there are many tools to help 💁‍♀️ 🛠️

The R package hexSticker inserts an image of your choice in a hexagon shape, lets you add text and choose the colors for background and border: 📷 + 🔡 + 🔶 2/5

github.com/GuangchuangYu/…
@allison_horst For the package that @CatalinaMMedina, @MineDogucu and I (@FedeZoeRicci) created this year, I made the drawing on a tablet and used a free online software to cut it into an hexagon shape and add a border: 3/5 diagram showing a drawing o...
Read 5 tweets
Dec 10, 2021
After that inspiring panel of women in Data Science, I want to give a spotlight and appreciation round to all those #RLadies that are active on #rspatial and that have definitely inspired me to also try and be part of the community 🤩
I invite you all to give a shoutout to those amazing #RSpatialLadies that have crossed your path!
Read 10 tweets
Dec 10, 2021
Loved this line from @McconnellKyla:
"I realized I was becoming a copy/paste data scientist and decided that I should invest more time in this"
@McconnellKyla @TnaniHedia just compared programming with playing during her journey of learning coding! Great way to see it! 😀
Read 14 tweets
Dec 10, 2021
Anyone wants to get started or has used #rspatial and learned from tutorials/courses using packages such as {rgdal} {rgeos} {sp} {raster}? Are you getting used to them already? You might want to reconsider 🤔
🧵 1/n
For the past few months the R Spatial community of developers and active users has been dealing with the news of the retirement of {rgdal} & {rgeos}
🧵 2/n
For {sp} is a similar story, although still getting maintenance, a migration to {sf} should be mostly preferred for any new developments and implementations
🧵 3/
Read 13 tweets
Dec 9, 2021
Hi all 👋 today I would like to talk about #CV tips. Did you know you can create your CV in #rstats? There are so many cool 📦 out there!
I personally use {vitae} with a good range of eye-catching templates to choose from, I highly recommend it!
What I like the most about building my CV in R is that I can organize everything in an R-project, I push to GitHub which gives me track changes and I can use the great advantages of #rmarkdown and #latex. Here is the repo of my #vitae CV github.com/loreabad6/R-CV
I had to modify a bit the .csl file to adapt certain details to my taste, and I included a 🗺️ of my journey which has had some nice feedback, as it serves as a visual presentation card 💃 A preview of my CV made with the vitae R package
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(