Tools like #chatgpt & github #copilot can help debug complex code and replace Googling + Stack Overflowing for common scripting.
Key skill: ChatGPT prompting (more on this in my free ChatGPT for Data Scientists)
2. Code Quality & Documentation
Great products have great documentation. AI can help produce documentation, comment code, and replace time-consuming manual documentation with automated AI docs.
When I was first learning data science, one of the things that tripped me up the most was Cross Validation.
In 5 minutes, I'll share 5 years of experimentation with dozens of Cross Validation techniques.
Let's dive in. π§΅
1. Cross Validation Goals:
Cross-validation is a statistical method used to estimate the accuracy of machine learning models
It's also used to measure the stability of models when combined with hyperparameter tuning of machine learning models.
2. Principles & Terminology:
The main principle behind cross-validation is partitioning a sample of data into complementary subsets, performing the analysis on one subset, and validating the analysis on the other subset (called the assessment set).
1. GenerativeAI is a 10X complement to Data Science
In the past, deep learning had limited uses in Business Intelligence, Data Analytics, and in particular within Data Science for Business contexts like working with Tabular data.
Generative AI is the opposite. Instead of trying to improve on Machine Learning, generative AI adds a superpower of automation.
The concept that helped me go from bad models to good models: Bias and Variance. In 4 minutes, I'll share 4 years of experience in managing bias and variance in my machine learning models.
Let's go. π§΅
1. Generalization:
Bias and variance control your models ability to generalize on new, unseen data, not just the data it was trained on. The goal in machine learning is to build models that generalize well. To do so, I manage bias and variance.
2. Low vs High Bias:
Models with low bias are usually complex and can capture the underlying patterns in data very well.
Models with high bias are overly simple and cannot capture the complexity in the data. They often underfit the training data.
Principal Component Analysis (PCA) is the gold standard in dimensionality reduction with uses in business. In 5 minutes, I'll teach you what took me 5 weeks. Let's go! π§΅
1. What is PCA?:
PCA is a statistical technique used in data analysis, mainly for dimensionality reduction.
It's beneficial when dealing with large datasets with many variables, and it helps simplify the data's complexity while retaining as much variability as possible.
2. How PCA Works:
PCA has 5 steps:
1. Standardization 2. Covariance Matrix Computation 3. Eigen Vector Calculation 4. Choosing Principal Components 5. Transforming the data.
90% of data scientists overlook how to design A/B Testing experiments.
4 tips for better experiments: π§΅
#DataScience #ABTesting
Tip 1: Include a pre-test
Pretest data is unaffected data before the actual A/B test or Time-based Experiment.
Pre-test is a secret used by Booking(dot)com in their CUPED A/B Test method for reducing variance (and improving decision-making from A/B Test results).
Tip 2: Factor in time to effect
For online conversions, sales effects can take time. Your experiment should factor this impact.
A different technique, called Causal Impact can be more important especially if the conversion is a longer sale-cycle / process.