R-squared is one of the most commonly used metrics to measure performance.
But it took me 2 years to figure out mistakes that were killing my regression models.
In 2 minutes, I'll share how I fixed 2 years of mistakes (and made 50% more accurate models than my peers).
Let's go: 🧵
1. R-squared (R2):
Is a statistical measure used in regression models that provides a measure of how well the observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.
2. Range (0 to 1 typically):
R2 typically ranges from 0 to 1.
A higher R2 value indicates a better fit between the prediction and the actual data.
For example, an R2 value of 0.70 suggests that 70% of the variance in the dependent variable is predictable from the independent variable(s).
When I was first learning data science, one of the things that tripped me up the most was Cross Validation.
In 5 minutes, I'll share 5 years of experimentation with dozens of Cross Validation techniques.
Let's dive in. 🧵
1. Cross Validation Goals:
Cross-validation is a statistical method used to estimate the accuracy of machine learning models
It's also used to measure the stability of models when combined with hyperparameter tuning of machine learning models.
2. Principles & Terminology:
The main principle behind cross-validation is partitioning a sample of data into complementary subsets, performing the analysis on one subset, and validating the analysis on the other subset (called the assessment set).
1. GenerativeAI is a 10X complement to Data Science
In the past, deep learning had limited uses in Business Intelligence, Data Analytics, and in particular within Data Science for Business contexts like working with Tabular data.
Generative AI is the opposite. Instead of trying to improve on Machine Learning, generative AI adds a superpower of automation.