Discover and read the best of Twitter Threads about #rstats

Most recents (24)

Albert Rapp

@rappa753

Need to extract days, months, years or more from time data?

Don't compute them all manually with {lubridate}. That's way too tedious.

The {timetk} package has a nice function that does all the heavy lifting for you.

LEFT: {lubridate} workflow
RIGHT: {timetk} workflow
#rstats

BONUS: Maybe you don't want use all of the stuff that {timetk} computes for you.

Here's a simple function that extracts only the parts you want.

All of the code can be found on GitHub at gist.github.com/AlbertRapp/2c9…

Also, shoutout to @EatsleepfitJeff for teaching me about this function from {timetk} ☺️

Read 4 tweets

Albert Rapp

@rappa753

Everybody loves colors but only few know how to use them well.

With the right guidelines, using colors becomes super easy.

Let me show you how to implement these guidelines with ggplot 🧵
#rstats

Anyone can create a stacked bar chart with ggplot.

But that can end up in a colorful & messy plot.

Let's implement a couple of guidelines from this datawrapper blog post to level up our color game blog.datawrapper.de/10-ways-to-use…

The key is to reduce the amount of colors and leverage the `alpha` aesthetic as well.

Read 11 tweets

Albert Rapp

@rappa753

Sometimes people ask me if I can do one-on-one R tutoring.

Sure I can. But then my hourly rate applies. And there are many amazing *free* resources. Want to try them first?

Here are a few that I recommend. #rstats

1 // Yet Again: R + Data Science

Find it at yards.albert-rapp.de

I'll start with one of my own bc I assume that you like my style (otherwise why ask me?)

Beware though: YARDS is a graduate-level course that I taught for math students w/ a bit of programming experience.

2 // R for Data Science

To me this book is like the R bible. It introduced me to the so-called tidyverse and taught me much of what I know.

This one starts out slow and is really beginner-friendly

r4ds.had.co.nz

Read 11 tweets

Rémi Thériault

@RemPsyc

The #rstats datawizard package (from the @easystats4u ecosystem) has two very useful functions to deal with duplicates.

* data_duplicated: Extract all duplicates including the first, unlike duplicated() or dplyr::distinct()

* data_unique: by default selects the "best" duplicate

data_duplicated() also contains an additional column reporting the number of missing values for that row, to help in the decision-making when selecting which duplicates to keep.

data_unique() can keep either the first, last, or "best" duplicate. The "best" duplicate (default) will pick the row with the smallest number of missing values. In case of ties, it picks the first one, as it is the one most likely to be valid and authentic, given practice effects

Read 3 tweets

Matt Dancho (Business Science)

@mdancho84

I've been experimenting with #chatgpt for #datascience for 16 weeks.

And I now have a process I'm happy with.

Here are the details. 🧵

#datascience #rstats

Using ChatGPT for data science has been a MASSIVE learning curve.

I began using it for complex workflows.

And I FAILED miserably.

Case in point- My first try was asking it to build me a machine learning model + a shiny app for scoring customers.

FAIL.

My ask was just too complex.

Read 8 tweets

Albert Rapp

@rappa753

{modelsummary} looks like a pretty slick package for reporting. Definitely keeping that one at the back of my mind 🤔 #rstats

And it looks like it is dangerous. The kind of dangerous I enjoy 🤣

If you liked this post, you may enjoy my 3-minute newsletter too.

Every week, my newsletter shares insights on
- R & dataviz,
- Shiny and web dev

Reading time: 3 minutes or less

You can join at
alberts-newsletter.beehiiv.com/subscribe

Read 3 tweets

Matt Dancho (Business Science)

@mdancho84

It took me 5-years to feel confident in data science.

True story.🧵

#datascience #rstats

This is coming from a person that has created two R packages that combine for 1.5 Million downloads.

Has trained elite data scientists at Apple, Walmart, Google.

And has built a career teaching students how to become data scientists.

Why did it take so long?

👉 Too many resources.

I thought I had to learn everything:

Deep learning.
Machine learning.
Algorithms.

The toughest part was figuring out which tools to learn and which were “red herrings” (a waste of time).

Read 9 tweets

Matt Dancho (Business Science)

@mdancho84

There are over 2,000 AI tools that have hit the market over the last 365 days.

So I condensed them into the best.

Here are the TOP 15 AI TOOLS for Data Scientists. 🧵

#datascience #rstats #python #career #ai

It's hard not to get excited about #AI. The potential is insane. It's also scary.

And the worst thing you can do for your career is ignore AI.

I mean, there are literally 2,000 new tools that have hit the market in 365 days. So where do you start?

I want to help.

Here are the 15 AI tools that, as a data scientist, MUST be on your radar (I'm road-testing ALL of these).

CODE:

1. ChatGPT: OpenAI's AI Chatbot openai.com/blog/chatgpt

Read 19 tweets

Albert Rapp

@rappa753

Data cleaning is tedious.

But it's much easier with the {janitor} package. Especially if you work with Excel files.

Here are 5 underrated features from {janitor}. #rstats

1 // Create clean names

This is absolutely the best function. It transforms column names such that they are easier to use for programming.

Left: Bad for programming
Right: Good for programming

2 // Remove empty or constant cells from Excel files

Excel files can be messy to read in R. Lots of weird column names and empty cells.

{janitor} takes care of that for us.

Read 8 tweets

Matt Dancho (Business Science)

@mdancho84

3 battle-tested skills that every data scientist should have.

(and how to apply them to a job interview)🧵

#datascience #skills #rstats #python

People don’t realize this but I was a data science consultant and corporate trainer...

That was long before I was a “teacher” and a “6-figure data science mentor”.

That’s where I learned these skills through battle-testing.

And my clients were my test subjects. 🧪🧑‍🔬

1. Focus on results

Outcomes are what moves mountains. Not analysis.

Yes- data analysis is incredibly important.

But, what’s more important is what you and your company do with the analysis.

Read 12 tweets

Matt Dancho (Business Science)

@mdancho84

“Why python vs R?”

“What’s inside your new python course?”

“What will it do for me (if I’m an #R user)?”

I’ve been getting a ton of questions so I figured Twitter can help me explain.

#python #rstats #datascience

1. R vs Python.

My guess is 90% of my followers use R.

So why am I promoting python?

Well I just wrote a full article on why R users should learn **some** Python…

Even if python is harder.

business-science.io/code-tools/202…

2. What’s inside my new python course?

I just put together a 5 minute video that shows one powerful use case for python.

And it’s the course project.

Read 4 tweets

Matt Dancho (Business Science)

@mdancho84

ChatGPT just made a Data Science Web App for me in under 15 minutes.

Here's the story. 🧵

#datascience #rstats

1. Machine Learning

ChatGPT created the basic code for an XGBoost Machine Learning model

The model is used for scoring the customer spend

2. Shiny App

ChatGPT produced the code for a basic shiny app that packages the model...

...And the app can make new spend predictions based on tweaking inputs for a new customer.

Read 11 tweets

Matt Dancho (Business Science)

@mdancho84

R users that want to learn Python.

This message is for you. 🧵

#rstats #python

If you don't know me, I'm Matt.

I'm an R user.

Nice to meet you...

And, I'm super pumped for today...

GOOD NEWS: My Python Course Launch is happening today at 2PM EST

BAD NEWS: It's officially over-registered. 4,872 registered. But ONLY 1000 get seats.

Here's what you need to do...

Read 5 tweets

Matt Dancho (Business Science)

@mdancho84

I just revealed the data science project that helped my small business generate $3,500,000 in revenue in the last 2 years.

Here are the juicy details...

#datascience #python #rstats #career

It's true. My company did almost $2,000,000 last year with an email list size of under 100,000.

And a big part of it was the Marketing Analytics that I'm revealing in my new Python course.

What did we do?

We used machine learning to score leads.

We then created nature vs sales targeting segments.

Then we did this...

Read 7 tweets

Dr Abdul Ali

@dr_abdulali125

The basics of R to Advance for Beginners.

Bookmark this Tweet and Follow me

COMMENT PDF - and I will send you the compiled PDF of more than 10 Cheat Sheets

If your inbox is disabled please DM me.

#DataAnalytics #DataScience #rstats #data #BigData

1/20

2/20

Read 6 tweets

Dr Abdul Ali

@dr_abdulali125

The basics of R for Beginners.

Bookmark this Tweet and Follow for More

COMMENT PDF - and I will send you the compiled PDF of more than 10 Cheat Sheets

#DataAnalytics #DataScience #rstats #data #BigData

If you have not received any DM from me that means your Inbox is disabled.

Follow me so that I can Inbox you.

Don't forget to:

✅ Follow
@dr_abdulali125
🔥

✅Like the first tweet in the thread
✅ Retweet this post to share with the world.

Have an amazing day 🙏

Read 4 tweets

Albert Rapp

@rappa753

Ever heard of logistic regression? Or Poisson regression? Both are generalized linear models (GLMs).

They're versatile statistical models. And by now, they've probably been reframed as super hot #MachineLearning.

Brush up on their math with this thread. #rstats

Let's start with logistic regression. Assume you want to classify a penguin as male or female based on its

* weight,
* species and
* bill length

Better yet, let's make this specific. Here's a dataviz for this exact scenario. It is based on the {palmerpenguins} data set.

As you can see, the male and female penguins form clusters that do not overlap too much.

However, regular linear regression (LR) won't help us to distinguish them. Think about it. Its output is something numerical. Here, we want to find classes.

Read 26 tweets

Matt Dancho (Business Science)

@mdancho84

Over the past 3 years, I've been writing Python code daily.

And this coming Thursday, I'm excited to share what I've learned. 🧵

#datascience #career #python #rstats

Most of you know me as an #R guy. And I 1000% love R. ❤️

But I've also grown over the past 10+ years that I've been practicing data science as a data scientist, as a consultant, and as an educator.

What I've come to realize is that no one language is perfect.

Each has its strengths. And weaknesses.

And more often than not we get caught up in debating things that **don't** really matter...

Like R vs Python.

Read 11 tweets

Matt Dancho (Business Science)

@mdancho84

There are more than 10,685,000 people learning to work with data.

Yet most data science jobs now have over 1000+ applicants.

And only 5 actually make it to an interview.

Learn these 4 skills to stand out (free to steal & use to advance your career).

#datascience #rstats

1. Time series analysis

Up to 90% of companies need better forecasting.

And less than 5% of data scientists are competent in time series analysis.

Key Point:

Becoming good at time series is a massive advantage that automatically places you in the 95th percentile of applicants.

Try This:

Demonstrate an application of time series analysis in the interview, and set yourself apart from the crowd.

Read 13 tweets

Matt Dancho (Business Science)

@mdancho84

Becoming a 6-figure data scientist (with #R) in today's economy is different than it was in 2022.

It requires a transformation...

#datascience #career #rstats

And those that successfully make the transformation:

1. Get hired 75% faster (3-6 months vs 12+ months)

2. Earn 17% more ($145K vs $117K)

3. Get promoted 2X faster (1-2 years vs 2-4 years)

Want to learn how?

I have a LIVE Masterclass today where I will reveal the exact steps to take.

Strict limit: 500 Seats (and we are over-capacity)

Read 4 tweets

Matt Dancho (Business Science)

@mdancho84

There are 100,000 data scientists on the market. How can you possibly get a job?

This is one way… 🧵

#datascience #rstats

Imagine this.

You walk into the Central Bank of Canada, the backbone of Canada’s financial system.

You’re applying for a data science position.

On paper the odds are stacked against you.

You use R.

The whole team uses Python.

And you are a complete unknown.

Read 10 tweets

Matt Dancho (Business Science)

@mdancho84

87% of data science projects fail.

One reason is the lack of production.

Here's an easy solution. 🧵

#datascience #python #R #rstats #career

This is called an #API (Application Programming Interface).

An API allows the user to make requests using a tool they are comfortable with.

The API runs a program that the data scientist has created on the server.

And the API returns a result.

Read 6 tweets

Matt Dancho (Business Science)

@mdancho84

What stresses me out isn’t the same thing that stressed my parents out.

True story. 🧵

#datascience #rstats

It was the day before R/Finance 2018.

I had a presentation for R/Finance where I was going to present a Shiny App that I was working on.

I updated my R software…

Big mistake.

My shiny app suddenly stopped working and I went into straight panic-mode. 😱

This was before I knew how to use GitHub & Docker like I do now.

So I was freaking out.

Read 12 tweets

Matt Dancho (Business Science)

@mdancho84

I continue to be impressed by the ease of doing machine learning with Pycaret.

This is especially great for #R people that want to learn #Python.

Let me explain...

#DataScience #Rstats

As many of you know, my primary data science toolkit is #R. ❤️

I've been doing data science in production at @bizScienc and developing open-source R software for over 10 years.

- Modeltime (+4 ecosystem pkgs)
- Timetk
- Tidyquant
- CorrelationFunnel

But, I interact with team members and interface with clients...

and their language of choice is often Python.

It's refreshing that I can quickly switch to Python when needed, and not need to write 5000 lines of #ScikitLearn code to do basic machine learning.

Read 5 tweets