Discover and read the best of Twitter Threads about #Rstats

Most recents (24)

Need to extract days, months, years or more from time data?

Don't compute them all manually with {lubridate}. That's way too tedious.

The {timetk} package has a nice function that does all the heavy lifting for you.

LEFT: {lubridate} workflow
RIGHT: {timetk} workflow
#rstats ImageImage
BONUS: Maybe you don't want use all of the stuff that {timetk} computes for you.

Here's a simple function that extracts only the parts you want.

All of the code can be found on GitHub at… Image
Also, shoutout to @EatsleepfitJeff for teaching me about this function from {timetk} ☺️
Read 4 tweets
Everybody loves colors but only few know how to use them well.

With the right guidelines, using colors becomes super easy.

Let me show you how to implement these guidelines with ggplot 🧵
Anyone can create a stacked bar chart with ggplot.

But that can end up in a colorful & messy plot.

Let's implement a couple of guidelines from this datawrapper blog post to level up our color game… Image
The key is to reduce the amount of colors and leverage the `alpha` aesthetic as well. Image
Read 11 tweets
Sometimes people ask me if I can do one-on-one R tutoring.

Sure I can. But then my hourly rate applies. And there are many amazing *free* resources. Want to try them first?

Here are a few that I recommend. #rstats
1 // Yet Again: R + Data Science

Find it at

I'll start with one of my own bc I assume that you like my style (otherwise why ask me?)

Beware though: YARDS is a graduate-level course that I taught for math students w/ a bit of programming experience. Image
2 // R for Data Science

To me this book is like the R bible. It introduced me to the so-called tidyverse and taught me much of what I know.

This one starts out slow and is really beginner-friendly
Read 11 tweets
The #rstats datawizard package (from the @easystats4u ecosystem) has two very useful functions to deal with duplicates.

* data_duplicated: Extract all duplicates including the first, unlike duplicated() or dplyr::distinct()

* data_unique: by default selects the "best" duplicate Image
data_duplicated() also contains an additional column reporting the number of missing values for that row, to help in the decision-making when selecting which duplicates to keep.
data_unique() can keep either the first, last, or "best" duplicate. The "best" duplicate (default) will pick the row with the smallest number of missing values. In case of ties, it picks the first one, as it is the one most likely to be valid and authentic, given practice effects
Read 3 tweets
I've been experimenting with #chatgpt for #datascience for 16 weeks.

And I now have a process I'm happy with.

Here are the details. 🧵

#datascience #rstats Image
Using ChatGPT for data science has been a MASSIVE learning curve.

I began using it for complex workflows.

And I FAILED miserably.
Case in point- My first try was asking it to build me a machine learning model + a shiny app for scoring customers.


My ask was just too complex.
Read 8 tweets
{modelsummary} looks like a pretty slick package for reporting. Definitely keeping that one at the back of my mind 🤔 #rstats ImageImage
And it looks like it is dangerous. The kind of dangerous I enjoy 🤣 Image
If you liked this post, you may enjoy my 3-minute newsletter too.

Every week, my newsletter shares insights on
- R & dataviz,
- Shiny and web dev

Reading time: 3 minutes or less

You can join at
Read 3 tweets
It took me 5-years to feel confident in data science.

True story.🧵

#datascience #rstats Image
This is coming from a person that has created two R packages that combine for 1.5 Million downloads.

Has trained elite data scientists at Apple, Walmart, Google.

And has built a career teaching students how to become data scientists.

Why did it take so long?
👉 Too many resources.

I thought I had to learn everything:

Deep learning.
Machine learning.

The toughest part was figuring out which tools to learn and which were “red herrings” (a waste of time).
Read 9 tweets
There are over 2,000 AI tools that have hit the market over the last 365 days.

So I condensed them into the best.

Here are the TOP 15 AI TOOLS for Data Scientists. 🧵

#datascience #rstats #python #career #ai Image
It's hard not to get excited about #AI. The potential is insane. It's also scary.

And the worst thing you can do for your career is ignore AI.

I mean, there are literally 2,000 new tools that have hit the market in 365 days. So where do you start?

I want to help.
Here are the 15 AI tools that, as a data scientist, MUST be on your radar (I'm road-testing ALL of these).


1. ChatGPT: OpenAI's AI Chatbot
Read 19 tweets
Data cleaning is tedious.

But it's much easier with the {janitor} package. Especially if you work with Excel files.

Here are 5 underrated features from {janitor}. #rstats
1 // Create clean names

This is absolutely the best function. It transforms column names such that they are easier to use for programming.

Left: Bad for programming
Right: Good for programming ImageImage
2 // Remove empty or constant cells from Excel files

Excel files can be messy to read in R. Lots of weird column names and empty cells.

{janitor} takes care of that for us. ImageImage
Read 8 tweets
3 battle-tested skills that every data scientist should have.

(and how to apply them to a job interview)🧵

#datascience #skills #rstats #python Image
People don’t realize this but I was a data science consultant and corporate trainer...

That was long before I was a “teacher” and a “6-figure data science mentor”.

That’s where I learned these skills through battle-testing.

And my clients were my test subjects. 🧪🧑‍🔬
1. Focus on results

Outcomes are what moves mountains. Not analysis.

Yes- data analysis is incredibly important.

But, what’s more important is what you and your company do with the analysis.
Read 12 tweets
“Why python vs R?”

“What’s inside your new python course?”

“What will it do for me (if I’m an #R user)?”

I’ve been getting a ton of questions so I figured Twitter can help me explain.

#python #rstats #datascience Image
1. R vs Python.

My guess is 90% of my followers use R.

So why am I promoting python?

Well I just wrote a full article on why R users should learn **some** Python…

Even if python is harder.…
2. What’s inside my new python course?

I just put together a 5 minute video that shows one powerful use case for python.

And it’s the course project.

Read 4 tweets
ChatGPT just made a Data Science Web App for me in under 15 minutes.

Here's the story. 🧵

#datascience #rstats Image
1. Machine Learning

ChatGPT created the basic code for an XGBoost Machine Learning model

The model is used for scoring the customer spend
2. Shiny App

ChatGPT produced the code for a basic shiny app that packages the model...

...And the app can make new spend predictions based on tweaking inputs for a new customer.
Read 11 tweets
R users that want to learn Python.

This message is for you. 🧵

#rstats #python Image
If you don't know me, I'm Matt.

I'm an R user.

Nice to meet you...
And, I'm super pumped for today...

GOOD NEWS: My Python Course Launch is happening today at 2PM EST

BAD NEWS: It's officially over-registered. 4,872 registered. But ONLY 1000 get seats.

Here's what you need to do...
Read 5 tweets
I just revealed the data science project that helped my small business generate $3,500,000 in revenue in the last 2 years.

Here are the juicy details...

#datascience #python #rstats #career Image
It's true. My company did almost $2,000,000 last year with an email list size of under 100,000.

And a big part of it was the Marketing Analytics that I'm revealing in my new Python course.

What did we do?
We used machine learning to score leads.

We then created nature vs sales targeting segments.

Then we did this...
Read 7 tweets
The basics of R to Advance for Beginners.

Bookmark this Tweet and Follow me

COMMENT PDF - and I will send you the compiled PDF of more than 10 Cheat Sheets

If your inbox is disabled please DM me.

#DataAnalytics #DataScience #rstats #data #BigData Image
1/20 Image
2/20 Image
Read 6 tweets
The basics of R for Beginners.

Bookmark this Tweet and Follow for More

COMMENT PDF - and I will send you the compiled PDF of more than 10 Cheat Sheets

#DataAnalytics #DataScience #rstats #data #BigData
If you have not received any DM from me that means your Inbox is disabled.

Follow me so that I can Inbox you.
Don't forget to:

✅ Follow

✅Like the first tweet in the thread
✅ Retweet this post to share with the world.

Have an amazing day 🙏
Read 4 tweets
Ever heard of logistic regression? Or Poisson regression? Both are generalized linear models (GLMs).

They're versatile statistical models. And by now, they've probably been reframed as super hot #MachineLearning.

Brush up on their math with this thread. #rstats
Let's start with logistic regression. Assume you want to classify a penguin as male or female based on its

* weight,
* species and
* bill length

Better yet, let's make this specific. Here's a dataviz for this exact scenario. It is based on the {palmerpenguins} data set. Image
As you can see, the male and female penguins form clusters that do not overlap too much.

However, regular linear regression (LR) won't help us to distinguish them. Think about it. Its output is something numerical. Here, we want to find classes.
Read 26 tweets
Over the past 3 years, I've been writing Python code daily.

And this coming Thursday, I'm excited to share what I've learned. 🧵

#datascience #career #python #rstats Image
Most of you know me as an #R guy. And I 1000% love R. ❤️

But I've also grown over the past 10+ years that I've been practicing data science as a data scientist, as a consultant, and as an educator.
What I've come to realize is that no one language is perfect.

Each has its strengths. And weaknesses.

And more often than not we get caught up in debating things that **don't** really matter...

Like R vs Python.
Read 11 tweets
There are more than 10,685,000 people learning to work with data.

Yet most data science jobs now have over 1000+ applicants.

And only 5 actually make it to an interview.

Learn these 4 skills to stand out (free to steal & use to advance your career).

#datascience #rstats Image
1. Time series analysis

Up to 90% of companies need better forecasting.

And less than 5% of data scientists are competent in time series analysis.
Key Point:

Becoming good at time series is a massive advantage that automatically places you in the 95th percentile of applicants.

Try This:

Demonstrate an application of time series analysis in the interview, and set yourself apart from the crowd.
Read 13 tweets
Becoming a 6-figure data scientist (with #R) in today's economy is different than it was in 2022.

It requires a transformation...

#datascience #career #rstats Image
And those that successfully make the transformation:

1. Get hired 75% faster (3-6 months vs 12+ months)

2. Earn 17% more ($145K vs $117K)

3. Get promoted 2X faster (1-2 years vs 2-4 years)
Want to learn how?

I have a LIVE Masterclass today where I will reveal the exact steps to take.

Strict limit: 500 Seats (and we are over-capacity)
Read 4 tweets
There are 100,000 data scientists on the market. How can you possibly get a job?

This is one way… 🧵

#datascience #rstats Image
Imagine this.

You walk into the Central Bank of Canada, the backbone of Canada’s financial system.

You’re applying for a data science position.
On paper the odds are stacked against you.

You use R.

The whole team uses Python.

And you are a complete unknown.
Read 10 tweets
87% of data science projects fail.

One reason is the lack of production.

Here's an easy solution. 🧵

#datascience #python #R #rstats #career Image
This is called an #API (Application Programming Interface). Image
An API allows the user to make requests using a tool they are comfortable with.

The API runs a program that the data scientist has created on the server.

And the API returns a result.
Read 6 tweets
What stresses me out isn’t the same thing that stressed my parents out.

True story. 🧵

#datascience #rstats Image
It was the day before R/Finance 2018.

I had a presentation for R/Finance where I was going to present a Shiny App that I was working on.

I updated my R software…

Big mistake.
My shiny app suddenly stopped working and I went into straight panic-mode. 😱

This was before I knew how to use GitHub & Docker like I do now.

So I was freaking out.
Read 12 tweets
I continue to be impressed by the ease of doing machine learning with Pycaret.

This is especially great for #R people that want to learn #Python.

Let me explain...

#DataScience #Rstats Image
As many of you know, my primary data science toolkit is #R. ❤️

I've been doing data science in production at @bizScienc and developing open-source R software for over 10 years.

- Modeltime (+4 ecosystem pkgs)
- Timetk
- Tidyquant
- CorrelationFunnel
But, I interact with team members and interface with clients...

and their language of choice is often Python.

It's refreshing that I can quickly switch to Python when needed, and not need to write 5000 lines of #ScikitLearn code to do basic machine learning.
Read 5 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!