Albert Rapp Profile picture
🎓 Math PhD student & freelancer 👨‍🏫 Bite-sized insights on dataviz, web dev & data science with R at https://t.co/M34b5BzHTD

May 31, 2023, 8 tweets

Data cleaning is tedious.

But it's much easier with the {janitor} package. Especially if you work with Excel files.

Here are 5 underrated features from {janitor}. #rstats

1 // Create clean names

This is absolutely the best function. It transforms column names such that they are easier to use for programming.

Left: Bad for programming
Right: Good for programming

2 // Remove empty or constant cells from Excel files

Excel files can be messy to read in R. Lots of weird column names and empty cells.

{janitor} takes care of that for us.

3 // Convert stupid Excel dates

Our lovely {janitor} can even convert those pesky Excel dates to something we can work with.

Check out how the hire_date column from our previous output is transformed to actual dates.

4 // Rounding

R does not always round up numbers that end with .5

R uses Banker's rounding 🤦 🤯

Instead of implementing the rounding yourself, just use {janitor} functions.

5 // Find duplicates

Sometimes you need to find duplicate observations based on a set of characteristics.

In that case, get_dupes() is your friend.

That was our short list. I hope you'll give {janitor} a try now.

All code examples can be found in my blog at albert-rapp.de/posts/07_janit…

Let me know if you liked this thread by liking its first post below ☺️

And for more content like this, follow @rappa753. See you next time 👋

If you liked this post, you may enjoy my 3-minute newsletter too.

Every week, my newsletter shares insights on
- R & dataviz,
- Shiny and web dev

Reading time: 3 minutes or less

You can join at
alberts-newsletter.beehiiv.com/subscribe

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling