I’m not saying you need to be an expert in advanced calculus to do machine learning…
BUT, there is a big difference between someone that does vs someone that does NOT have a good foundation in stats when it comes to getting & explaining business results.
My thought process back in the day was to obtain a great foundation in stats and machine learning at the same time.
So here’s what helped me. I read a ton of books.
Here are the 3 books that helped me learn data science the most...
1. R for Data Science (Wickham & Grolemund) r4ds.had.co.nz
Residuals are the key to improving model performance.
But it took me 5 years to figure this out.
In 5 minutes, I'll share what took me 5 years to figure out. Let's go. 🧵
1. What are residuals?
In statistics and machine learning, "residuals" refer to the differences between observed values and the values predicted by a model. These are your model errors
2. Residual Analysis:
The key to understanding if your model is any good is residual analysis. What I'm looking for is: Linearity, Homoskedasticity (constant variance), and lack of pattern.
Logistic Regression is how my simple lead scoring model grew revenue to $15,000,000.
In 3 minutes, here's what took me 3 months to figure out (business case included).
Let's dive in. 🧵
1. Binary Classification:
Logistic regression is a statistical method used for analyzing a dataset in which one or more independent variables determine a binary outcome (in which there are only two possible outcomes). This is commonly called a binary classification problem. 0 = customer didn't buy, 1 = customer bought!
2. Linear Regression vs Logistic Regression (Why I made the switch):
In 2015 I was still in the early stage of my data science journey. And when I first modeled leads, I made a rookie mistake: using linear regression. While it actually worked (sort of) for lead scoring, I had a big problem. Linear regression didn't provide a probability.
Data Scientist vs. AI Engineer (Generative AI Edition)
I've been studying AI for 18 months. This is what I discovered about the rise of this new role:
1) Context: The Rise of AI Engineering
- Data scientists have been called the “sexiest job of the 21st century.”
- But generative AI breakthroughs have led to a new role: AI engineers.
- Think of data scientists as data driven decisioneers vs. AI engineers as AI system builders.
2) Use Cases
Data Scientists:
- Focus on descriptive & predictive analytics (e.g., EDA, clustering, regression, classification).
- Turn messy data into actionable insights.