Data Scientist | Host of 'The Data Scientist Show' | now @predibase, ex-Amazon | Nerdy joke connoisseur 📍SF
May 11, 2022 • 6 tweets • 2 min read
How to do outlier detection in 2022:
Outlier detection is a common use case in tech. Here are 3 techniques you should know
(ranked by capabilities ):
1. Z-score ★ ★ ★
Z-score is measured in terms of standard deviations from the mean. For one-dimension data, when Z-score >3 are likely outliers. It means that they are at least 3 standard deviations away from mean.
It's easy to use but only for 1-dimensional data.
May 11, 2022 • 6 tweets • 1 min read
Why data scientists are leaving their jobs?
Because they found that they are doing "data science engineering" instead of "data science research":
• Most companies have A/B testing tools, and data scientists design metrics and automate reports (BI).
• For ML, it's easy to load a model from sklearn, and the work is in feature engineering and putting models in production.
• When there's no data, you do data engineering.
Apr 12, 2022 • 4 tweets • 1 min read
Don't let the lack of math skills stops you from getting into data science.
The blog below listed some essential skills. I like that it points out a lot of calculations can be done using Python/R libraries.
Here is how I would prioritize:
• I would spend more time on statistics, linear regression, and probabilities.
Because they are most commonly used concepts and the foundation to solve most problems.