🔥 Matt Dancho (Business Science) 🔥 Profile picture
Future Is Generative AI + Data Science | Helping My Students Become Generative AI Data Scientists ($200,000 /year career) 👇
9 subscribers
Mar 22 • 7 tweets • 2 min read
Data Science for Business.

The book that helped me connect the dots. Let's dive in: Image 1. CRISP Data Mining Process

The foundation for applying data science to business is the CRISP method.

This is a helpful framework for integrating data science with the business understanding. Image
Mar 21 • 10 tweets • 4 min read
90% of data scientists can improve their SQL for business intelligence.

In 3 minutes, learn the 20% of SQL gets 80% of results: Image 🔍 SELECT Basics:

Start with SELECT * FROM table to retrieve all rows & columns.

Remember, SQL isn’t case-sensitive—but capitalizing keywords (SELECT, FROM) makes your queries easier to read. Image
Mar 20 • 12 tweets • 4 min read
Understanding probability is essential in data science.

In 4 minutes, I'll demolish your confusion.

Let's go! Image 1. Statistical Distributions:

There are 100s of distributions to choose from when modeling data. Choices seem endless. Use this as a guide to simplify the choice. Image
Mar 17 • 11 tweets • 4 min read
6 statistical methods that can be used for A/B Testing (and when to use them). Image A/B Testing is a staple of data science and data analyst interviews.

And it's the Number 1 technique that companies benefit from in improving customer revenue.

So here are 6 of the most common stat methods used in A/B testing.
Mar 16 • 11 tweets • 4 min read
R-squared is one of the most commonly used metrics to measure performance.

But it took me 2 years to figure out mistakes that were killing my regression models.

In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers). Let's go:Image 1. R-squared (R2):

R2 is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model. Image
Mar 15 • 10 tweets • 4 min read
Logistic Regression is the most important foundational algorithm in Classification Modeling.

In 2 minutes, I'll teach you what took me 2 months to learn.

Let's go: 🧵 Image 1. Logistic regression:

Is a statistical method used for analyzing a dataset in which there are one or more independent variables that determine a binary outcome (in which there are only two possible outcomes). This is commonly called a binary classification problem. Image
Mar 13 • 9 tweets • 3 min read
PowerBI and Tableau are about to die.

Vanna AI is a new open-source Python framework that enables realtime analytics and SQL generation.

Let's explore: Image Vanna is an MIT-licensed open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation and related functionality. Image
Mar 12 • 5 tweets • 2 min read
Data scientists are out.

The Generative AI Data Scientist is in.

Let me explain: Image Companies are sitting on mountains of unstructured data.

PDF
Word docs
Meeting notes
Emails
Videos
Audio Transcripts

This is useful data. But it's unusable in its existing form. Image
Mar 11 • 11 tweets • 4 min read
Principal Component Analysis (PCA) is the gold standard in dimensionality reduction.

But PCA is hard to understand for beginners.

Let me destroy your confusion: Image 1. What is PCA?

PCA is a statistical technique used in data analysis, mainly for dimensionality reduction. It's beneficial when dealing with large datasets with many variables, and it helps simplify the data's complexity while retaining as much variability as possible.
Mar 10 • 12 tweets • 4 min read
K-means is an essential algorithm for Data Science.

But it's confusing for beginners.

Let me demolish your confusion: Image 1. K-Means

K-means is a popular unsupervised machine learning algorithm used for clustering. It's a core algorithm used for customer segmentation, inventory categorization, market segmentation, and even anomaly detection. Image
Mar 9 • 10 tweets • 3 min read
R-squared is one of the most commonly used metrics to measure performance.

But it took me 2 years to figure out mistakes that were killing my regression models.

In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers). Let's go:Image 1. R-squared (R2):

R2 is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
Mar 8 • 11 tweets • 3 min read
Correlation is the skill that has singlehandedly benefitted me the most in my career.

In 3 minutes I'll demolish your confusion (and share strengths and weaknesses you might be missing).

Let's go: Image 1. Correlation:

Correlation is a statistical measure that describes the extent to which two variables change together. It can indicate whether and how strongly pairs of variables are related.
Mar 6 • 9 tweets • 2 min read
Google just dropped a new Generative AI Python library for SQL Databases.

Introducing Google GenAI Toolbox.

This is what you need to know: Image 1. Meet the Google GenAI Toolbox

An open-source server designed to simplify building Gen AI tools for your databases. It streamlines development, letting you integrate powerful data tools with just a few lines of code.
Mar 5 • 9 tweets • 4 min read
A Python Library for Time Series using Hidden Markov Models.

Let me introduce you to hmmlearn. Image 1. Hidden Markov Models

A Hidden Markov Model (HMM) is a statistical model that describes a sequence of observable events where the underlying process generating those events is not directly visible, meaning there are "hidden states" that influence the observed data, but you can only see the results of those states, not the states themselvesImage
Mar 1 • 9 tweets • 3 min read
Python has crazy forecasting libraries.

Let me introduce you to Kats, by Meta (Facebook) Image Kats is a toolkit to analyze time series data and a lightweight, easy-to-use, and generalizable framework to perform time series analysis. It covers:

- Forecasting
- Detection
- Feature Extraction
- Simulation
Mar 1 • 10 tweets • 5 min read
The price of the Python AI/ML Stack I've been using for 12 months:

Langchain $0
Langgraph $0
Scikit Learn $0
H2O $0
Torch $0
Pandas $0
Numpy $0
Plotly $0
Statsmodels $0
Ollama $0
OpenAI (<$1.00 per month)

Becoming a Generative AI Data Scientist cost me $12: 🧵 Image 1. Environment:

- VSCode
- Conda
- Jupyter VSCode Integration

Start here: code.visualstudio.com/docs/datascien…Image
Feb 25 • 5 tweets • 2 min read
AI is about to kill Tableau and PowerBI.

Every dashboard can now be created in seconds with these Free Agents: Image Agents can now create these dashboards:

1. Content Performance
2. Email Performance
3. Google Analytics
4. Historical Sales Trends
4. Churn and Subscription Renewal Image
Feb 23 • 11 tweets • 3 min read
6 statistical methods that can be used for A/B Testing (and when to use them). 🧵 Image A/B Testing is a staple of data science and data analyst interviews.

And it's the Number 1 technique that companies benefit from in improving customer revenue.

So here's a 6 of the most common stat methods used in A/B testing.

Let's dive in.
Feb 22 • 5 tweets • 2 min read
It took me 5 years to master all 24 of these machine learning concepts.

In the next 24 days, I'll teach them to you one by one (with examples of how I've used them). Here's what's coming:

1. Linear Regression
2. Clustering
3. Decision Tree
4. Neural Networks
5. Reinforcement Learning
6. Logistic Regression
7. Naive BayesImage 8. Supervised Learning
9. Support Vector Machine
10. Probability
11. Random Forest
12. Variance
13. Evaluation Metrics
14. Bagging
15. Data Wrangling
16. Dimensionality Reduction
17. K-nearest Neighbors Algorithm
18. Programming
19. Regularization
20. Statistics
21. Binomial Distribution
22. Bootstrap Sampling
23. Exploratory Data Analysis
24. Data Collection
Feb 20 • 7 tweets • 2 min read
Data Science for Business.

The book that helped me connect the dots. Let's dive in: Image 1. CRISP Data Mining Process

The foundation for applying data science to business is the CRISP method.

This is a helpful framework for integrating data science with the business understanding. Image
Feb 20 • 10 tweets • 3 min read
90% of data scientists can improve their SQL for business intelligence.

In 3 minutes, learn the 20% of SQL gets 80% of results: Image 🔍 SELECT Basics:

Start with SELECT * FROM table to retrieve all rows & columns.

Remember, SQL isn’t case-sensitive—but capitalizing keywords (SELECT, FROM) makes your queries easier to read. Image