A topic that comes up in every interview.

Bias, variance, and their relationship with machine learning algorithms. One of the most basic concepts that you have to know by heart.

Here is a simple summary that you will easily remember.

Every machine learning algorithm deals with 3 types of errors:

1. Bias error
2. Variance error
3. Irreducible error

There's nothing we can do about #3.

Let's focus on the other two.

1/5
"Bias" refers to the assumptions the model makes to simplify the process of finding answers.

The more assumptions it makes, the more biased the model is.
"Variance" refers to how much the answers given by the model will change if we use different training data.

If the answers stay the same regardless of the data, the model has low variance.
Often, linear models are high-bias, and nonlinear models are low-bias.

Example low-bias algorithms:
• Decision Trees
• SVN
• kNN

Example high-bias algorithms:
• Linear Regression
• Logistic Regression
Often, linear models are low-variance, and nonlinear models are high-variance.

Example low-variance algorithms:
• Linear Regression
• Logistic Regression

Example high-variance algorithms:
• Decision Trees
• SVN
• kNN
Sometimes, you can change how these algorithms work to get a different tradeoff between their bias and variance.

Example:

• By increasing the value of "k" in kNN, we can increase the algorithm's bias.

• By pruning a Decision Tree, we can reduce its variance.
It doesn't matter what you do; the tradeoff is always there:

• Increasing bias decreases variance.
• Increasing variance decreases bias.

To work around this:

• Choose the appropriate algorithm
• Configure it correctly
• Work with the underlying dataset
If you want low-bias and low-variance machine learning content, follow me @svpino.

I come here to write about machine learning, and I promise you'll enjoy it.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

10 May
A 6-step process that completely changed my life:

• Maximize what you don't learn
• Avoid schedules
• Use uncomfortable situations
• Learn as a byproduct
• Teach somebody else
• Circle back in a month

On how to learn efficiently and get ahead in life: ↓
Everything starts with maximizing the things I don't learn.

If I spend time on things that don't bring me value, I can't focus on what really matters to me.

By default, everything around me is noise until it's impossible to ignore.
If I don't see the value right away, I'll ignore it. Important things will make their way back to me.

Ignoring the noise makes space for what truly matters.
Read 15 tweets
8 May
My recommendation to learn machine learning:

• Introduction to Python Programming (Udacity)
• Machine Learning Crash Course (Google)
• Machine Learning (Coursera)

In that order. They are all free. They are all amazing.

And take your time. This is a marathon, not a sprint.
Kaggle is an amazing place to practice what you learn.
And of course, there’s always my newsletter, and my Twitter account… if you truly want to learn machine learning, you definitely want to stay tuned!
Read 4 tweets
7 May
Do you know what scares me? Data labeling in machine learning.

We don't talk enough about it, and yet we can't do anything unless we solve this first. Labeling enough data is expensive or even outright impossible.

Some ideas to solve this problem.

Let's start with an example:

You have terrain and weather information for different locations. Your goal is to build a model that predicts where to drill to find oil.

How do you label this data? You drill to find out where the oil is.

This is ridiculously expensive.
To get around this problem, you need to minimize the number of labeled examples you need to build a good model.

1. Take the data
2. Select as few examples as possible
3. Drill those holes to come up with the labels
4. Train the model

How can you achieve #2?
Read 11 tweets
6 May
12 machine learning YouTube videos.

On libraries, algorithms, tools, and theory.

1. Jupyter Notebooks:

2. Pandas:

3. Matplotlib:

4. Seaborn:
5. Numpy:

6. Decision Trees:

7. Neural Networks:

8. Scikit-Learn:
Read 4 tweets
5 May
Machine learning education is broken.

If you are preparing for a research position, you are good. If you are looking to get out there and start solving problems, not even close.

Here are some thoughts so you can get ahead.

Most classes, courses, and books cover the same road.

They start with a dataset. They finish with a working model. The focus is always on everything that happens in between.

Dataset → Model.

This is great, but not enough.
Real-life situations rarely start with a dataset, and they never end after you finish building your model.

Applying machine learning successfully is hard.

Here are a few examples that you should keep in mind.
Read 13 tweets
1 May
A little over 12 years ago, the police started building a case against me.

That was stressful. They were watching. They wanted to take me off the streets.

Here is the story of how I fled Cuba and came to the United States.

After finishing college, I started taking freelance projects.

That was illegal. The Cuban government didn't allow people to make money working for foreign companies.

If you were lucky, you could get 2 years in jail. They called it "illicit enrichment."
We were a small group of friends. We met at my house every morning.

We paid a foreign national for Internet access. Cubans weren't allowed to buy it, so we had to get creative.

It was a 56kbs connection shared across 4 computers.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(