🔥 Matt Dancho (Business Science) 🔥 Profile picture
On a mission to grow your business data science skills and accelerate your career | Get my free 5-day business for data scientists course 👇
7 subscribers
Dec 12 11 tweets 3 min read
Some people really like Data Cleaning. I don't.

It takes me away from analyzing data. The fun part.

So I made an AI Agent to help. Image The AI Agent contains 4 steps:

1. Create data cleaner code
2. Execute data cleaner code
3. If problem, fix code
4. Once fixed, explain code

Here's how it works:
Nov 16 7 tweets 2 min read
Python has powerful time series libraries.

Case in point: skforecast

Let me explain: Image Skforecast is a Python library for time series forecasting using machine learning models.

Skforecast works with any regressor compatible with the scikit-learn API, including popular options like LightGBM, XGBoost, CatBoost, Keras, and many others. Image
Nov 15 6 tweets 2 min read
My favorite R package for ultra-fast exploratory analysis: Image The R package is called correlationfunnel.

Yes, I built it.
Nov 14 9 tweets 3 min read
The best beginner book on time series?

FPP: Forecasting Principles and Practice

Let's dive in: Image Some may consider FPP the bible of time series.

I agree.

Start with FPP Version 2 or 3, and you won't go wrong.

This is what I like:
Nov 12 8 tweets 2 min read
BEAST: A Bayesian Ensemble Algorithm for Change-Point Detection and Time Series Decomposition in R

Let's explore: Image 1. What is BEAST?

BEAST stands for Bayesian Estimator of Abrupt change, Seasonality, and Trend.

But what does that mean?
Nov 11 12 tweets 3 min read
I used to struggle with working with Time Series.

After 10 years, I mastered it.

Then I spent 3 years making this R package so you can too: Image The R package is timetk. I built it to make your life easier when working with Time Series:

- Plotting (Visualization)
- Data Wrangling
- Correlation
- Seasonality
- Imputation
- Outliers (Anomalies)
- Feature engineering
- Cross Validation
Nov 9 10 tweets 3 min read
R is crazy good at forecasting.

Just learn this R package: Image The R package is modeltime (and yes, I created it).

Modeltime's goal is to make high-performance time series analysis easier, faster, and more scalable in R. Image
Nov 8 6 tweets 2 min read
R is wild for data visualization.

I can't believe how easy it is to make this in R.

This is how: Image The R package is ggside.

Here's the code for a scatter plot + side distributions: Image
Nov 6 8 tweets 2 min read
Need to plot distributions?

Introducing ggdist: Image 1. Unlock a new family of ggplot geoms and stats

- slabinterval
- halfeye
- dots
- dotsinterval
- histinterval
- and more. Image
Oct 30 7 tweets 2 min read
R can make flow charts programmatically.

This is how. Image The R package is ggflowchart.

It makes it super easy to create flow charts programmatically in R: Image
Oct 22 4 tweets 1 min read
Have you heard of Upset Plots?

It's a new plot type designed to understand intersections between many categories. Image Get it free in R (on Github here): github.com/const-ae/ggups…
Oct 21 5 tweets 2 min read
This is mind-blowing (I only wish I knew it sooner).

Easy linear regression coefficient reports in R.

This is how: Image The function is ggstats::ggcoef_model().

It turns your Linear Regression into an easy-to-understand coefficient report.

I don't know how many times I've done this the hard way. Image
Oct 20 8 tweets 2 min read
This new R package is wild.

It's called ggalign. Here's what it can do: Image 1. Gene expression matrix: Image
Oct 13 9 tweets 2 min read
25 R packages you need to learn in 2024.

Let's dive in: Image 1. Janitor: Simplifying Data Cleaning
2. Skimr: Quick Data Summarization
3. bslib: Next-Gen UI for Shiny Apps
4. box: Modularize Your R Scripts
5. data.table & tidytable: High-Performance Data Manipulation
Oct 12 7 tweets 2 min read
How to analyze your data faster with R: Image In the fast-paced world of data science, getting quick insights into your data is crucial.

It allows you to understand your data better, make informed decisions, and expedite the modeling process.
Oct 9 16 tweets 4 min read
A/B Testing in R.

This is how: Image 1. A/B Testing Definition:

A/B testing is a statistical method for comparing 2 groups to determine if there is a statistically significant difference between the two groups.

It's often used in marketing analytics to gauge the effectiveness of marketing efforts.
Oct 8 7 tweets 2 min read
What is the most difficult part of becoming a data scientist?

3 common mistakes (and tips that worked for me): Image 1. Not solving business problems

Solving business problems is what drives revenue for your company.

But it’s tough to figure out how to convert to a data problem.

The trick is building.

When you make things, you make mistakes. And you grow.
Oct 6 13 tweets 4 min read
Stop using box plots.

Use this plot instead.

Here's why: Image I used to love Box Plots. Until I found out the harsh truth.

Boxplots fail when it comes to bimodal and multi-modal distributions. Image
Oct 5 11 tweets 3 min read
Myth: R can't be used for production

Fact: R can with Vetiver.

Let me explain: Image Vetiver is an R package that provides tooling to:

- Version
- Deploy
- Monitor

Trained ML Models. Here's how:
Oct 1 11 tweets 2 min read
One of the many reasons I chose R.

True story. 🧵 Image By 2024, I’ve written 10+ R packages. Timetk, tidyquant & modeltime alone have 2,500,000+ downloads.

People see me now and think my success with R happened overnight. It didn’t.

I actually started my data science journey in Python.

It was miserable.
Sep 27 9 tweets 2 min read
It took me 5-years to feel confident in data science.

True story.🧵 Image This is coming from a person that has created two R packages that combine for 1.5 Million downloads.

Has trained elite data scientists at Apple, Walmart, Google.

And has built a career teaching students how to become data scientists.

Why did it take so long?