Tweet

Ammar Yasser

21 Jun, 16 tweets, 4 min read

Bias/Variance analysis is one of the most important concepts in machine learning as it allows you to measure and improve the performance of your model.
🧵In this thread, I will discuss the basic idea behind it, so let's dive right in.
#MachineLearning #ArtificialIntelligence

1⃣Bias (Underfitting)
What is bias?
In simple terms, bias is the difference between the model predictions and the actual values.
So if a model has a high bias, it means it has underfitted the training data. In other words, the model has a high training set error.

What causes a high bias?
Bias, or underfitting, mainly occurs when the model is too simple to understand the underlying structure of the training data, and so it misses the relevant relations between the input features and target outputs.

How do we measure bias?
Measuring bias is usually tricky because it mainly depends on the problem you are trying to solve. Bias is measured by comparing the training set error relative to Bayes error rate. Bayes error is the lowest possible error rate for any function.

Since Bayes error is a theoretical irreducible error rate, it can not be measured. That is why we use human error rate as an estimate or approximation of Bayes error.
So we define "avoidable bias" as the difference between human error and training set error.

For example, for some tasks such as image recognition, human error rate is usually around 0%. So if the model has a training set error of 10%, this means you have an avoidable bias of 10% and your model is underfitting the training data.

For other tasks such as speech recognition in a noisy environment, where it is relatively difficult to hear clearly, human error rate can be around 15% for example. If your model has 18% training set error, then you only have 3% bias, and the model performance is acceptable.

How to reduce bias (underfitting)?
1. Select a more complex and powerful model architecture.
2. Feature engineering to create more and better input features.
3. Longer training.
4. Decrease regularization (model constraints).

2⃣Variance (Variance)
What is variance?
In a nutshell, variance is the difference between the training set error and the development set error.
So variance describes the amount that the model predictions will change when different data is used.

If a model has a high variance, this means that it has overfitted the training data.

What causes a high variance?
High variance, or overfitting, happens when the model is too complex relative to the amount and nosiness of the training data.

A model with a high variance pays too much attention to the training data and does not generalize well to data it has not seen before. As a result, the model can perform really well on the training data, but has high error rate on the development set and test set.

How to reduce variance (overfitting)?
1. Simplify the model architecture.
2. Select a model with fewer parameters and input features.
3. Get more training data (Data augmentation).
4. Clean the data to reduce the noise.
5. regularization (will be discussed in another thread).

Practical Examples
cases no. 1:
human error = 1%
training error = 5%
development error = 6%
This model has a high bias (4%) and low variance (1%)

cases no. 2:
human error = 1%
training error = 2%
development error = 7%
This model has a low bias (1%) and high variance (5%)

cases no. 3:
human error = 1%
training error = 5%
development error = 10%
This model has a high bias (4%) and high variance (5%)

cases no. 4:
human error = 1%
training error = 2%
development error = 3.5%
This model has a low bias (1%) and low variance (1.5%)

Final Notes
a. A model's performance usually slows down once it approaches human-level performance.
b. Before building a machine learning model to solve a particular problem, you should ask about the current solution to give you a good reference for your model performance.

@ammar_yasser92

If you found this thread to be helpful, please consider retweeting it.
For content about machine learning and computer science, kindly follow me @ammar_yasser92.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Ammar Yasser

Try unrolling a thread yourself!

More from @ammaryh92

Ammar Yasser

Ammar Yasser

Did Thread Reader help you today?

Like this author's thread?