The ABSOLUTE ESSENTIALS of Bias/Variance Analysis

🧵This thread will cover the following concepts:
a. Bayes Error
b. Bias vs Variance
c. Possible Solutions

(Explanation + Examples)

#MachineLearning #DataScience
📜Introduction
- After training a ML model, it is important to assess its performance before putting it into production.
- We start by measuring the model performance on the training set to evaluate how well the model fits the training data.
- Then we measure the model performance on the test set to evaluate the generalization error.

To measure the model performance on the training set, we need a reference value against which we can compare the model performance.
This reference value is called "Bayes Error". 👇
1⃣ Bayes Optimal Error
- It is the lowest possible error rate for any function (Doesn't have to be 0%).
- Bayes error is a theoretical value that can never be surpassed.
- Since it is a theoretical value, it can't be measured but rather estimated using human-level performance.
Humans tend to be quite good at a lot of tasks such as image and speech recognition. In these cases, human-level performance isn't that far from Bayes error and can be used as an approximation of it.
(Human error ≈ Bayes error).
Note that Bayes error (and human error) does not have to be 0% for all tasks.
for example, speech recognition in a noisy environment (difficult to hear clearly).
In that case, it is impossible to generate a 100% accurate transcription, and Bayes error can be equal to 10%.
2⃣ Bias (Underfitting)
- Bias is the difference between the model predictions and the actual values.
- A model with a high bias means that it has underfitted the training data (the model has a high training set error relative to Bayes error).
- Bias mainly occurs when the model is too simple to understand the underlying structure of the training data, and so it misses the relevant relations between the input features and the target outputs.
- Bias is measured by comparing the training set error to Bayes error.
- Avoidable bias is defined as the difference between the Bayes error (or human error) and the training set error.
For example, for some tasks such as image recognition, the human error rate (an approximation of Bayes error) is usually around 0%. So if the model has a training set error of 10%, this means you have an avoidable bias of 10% and your model is underfitting the training data.
For other tasks such as speech recognition in a noisy environment, the human error rate can be around 15% for example. If your model has an 18% training set error, then you only have a 3% avoidable bias, and the model performance is acceptable.
✍️How to reduce Bias (Underfitting)?
1. Select a more complex and powerful model architecture.
2. Feature engineering to create more and better input features.
3. Longer training.
4. Decrease regularization (model constraints).
5. Better optimization algorithm.
3⃣ Variance (Overfitting)
- Variance is the difference between the training set error and the development set error.
- So variance describes the amount that the model predictions will change when different data is used.
- If a model has a high variance, this means that it has overfitted the training data, and can't generalize well to data instances it hasn't seen before.
- High variance (overfitting) happens when the model is too complex relative to the amount and nosiness of the training data.
- A model with a high variance pays too much attention to the training data and does not generalize well to data it has not seen before. As a result, the model can perform really well on the training data but has a high error rate on the development set and test set.
✍️How to reduce variance (overfitting)?
1. Simplify the model architecture.
2. Select a model with fewer parameters and input features.
3. Get more training data (Data augmentation).
4. Clean the data to reduce the noise.
5. regularization (will be discussed in another thread).
📜Practical Examples
🔴cases no. 1:
Bayes error = 1%
training error = 5%
development error = 6%
This model has a high bias (4%) and low variance (1%)

🔴cases no. 2:
Bayes error = 1%
training error = 2%
development error = 7%
This model has a low bias (1%) and high variance (5%)
🔴cases no. 3:
human error = 1%
training error = 5%
development error = 10%
This model has a high bias (4%) and high variance (5%)

🔴cases no. 4:
human error = 1%
training error = 2%
development error = 3.5%
This model has a low bias (1%) and low variance (1.5%)
Final Notes
a. A model's performance usually slows down once it approaches human-level performance.
b. Before building a machine learning model to solve a particular problem, you should ask about the current solution to give you a good reference for your model performance.
That's it for this thread.

For more content about #MachineLearning and #DataScience, follow me @ammaryh92.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ammar Yasser

Ammar Yasser Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ammaryh92

25 Jul
The ABSOLUTE ESSENTIALS of Splitting Data for Machine Learning

(Explanation + Scikit-learn Implementation)

🧵 Long Thread 👇👇
#MachineLearning #DataScience
📜Introduction
Why do we need to split our data?
After training the model, you want to test its performance on new data before putting it in production. In other words, you want to measure the generalization error (how well does the model generalize to new data?).
The data is commonly split into 3 different sets:
1. Training Set
2. Development Set (Holdout Validation Set)
3. Test Set
Read 21 tweets
24 Jul
The ABSOLUTE ESSENTIALS of scikit-learn every data scientist should know
(Introduction + Examples)

🧵Long Thread 👇👇
#MachineLearning #DataScience
✍️Introduction
- scikit-learn is one of the most famous python libraries for machine learning.
- scikit-learn allows you to easily build and train machine learning models through its simple and well designed API.
- However, I will try to simplify the API for beginners.
1⃣ Estimators
- The process of learning parameters from input data is called "Estimation", and therefore any object that learns some parameters from data is called an "Estimator".
- The estimation process itself is performed by calling the fit( ) method of any estimator object.
Read 17 tweets
23 Jul
I've written multiple threads on how to get started with #MachineLearning, #DeepLearning , and #DataScience in general.
check them out (Bookmark).
🧵👇👇
Read 5 tweets
23 Jul
#python packages for #DataScience and #MachineLearning
(explanation + resources)

🧵👇
Pandas
- Pandas is probably one of the most powerful and flexible open source data analysis and manipulation tool available in any language.
- It provides a wide range of functions for data wrangling and cleaning.
resources:
1⃣ youtube.com/playlist?list=…
2⃣ Image
NumPy (Numerical Python)
- NumPy is an open source project aiming to enable numerical computing with Python.
- It provides functions and methods for performing high level mathematical functions on multi-dimensional arrays and matrices.

resources:
1⃣ Image
Read 12 tweets
23 Jun
The term Machine Learning sounds mysterious and confusing to a lot of people especially beginners.
In this thread, I will try to explain how does a machine learn, and why do we even need machine learning?
🧵👇
In pre-machine learning era, we had what is called "rule-based systems".
This basically means that we provide a machine with a bunch of instructions on how to perform a certain task.
For example, if we need to write a function that returns the square of a number.
With rule-based system, this is very easy.
1. First we define a function called Square, for example.
2. Square function takes X as an input, where X can be any number.
3. Square function multiplies X by itself (X **2).
4. Square function returns the result to the user.
Read 13 tweets
22 Jun
If you are planning to get into machine learning, then you are likely to use scikit-learn, one of Python's most famous libraries.
In this thread, I will try to break down scikit-learn's API which could be intimidating in the beginning.
🧵👇
#MachineLearning
Estimators
- An estimator is any scikit-learn object that learns some parameters from data.
- All estimators implement "fit()" method to perform the estimation process.
- Estimators can also act as transformers or predictors.
Transformers
- They are estimators which use the parameters that they have learned to transform data.
- All transformers can implement "transform()" method to perform the transformation process.

I know this sounds vague, but it will become clearer after the following example.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(