Overfitting sucks.

Here are 7 ways you can deal with overfitting in Deep Learning neural networks.

πŸ§΅πŸ‘‡
A quick reminder:

When your model makes good predictions on the same data that was used to train it but shows poor results with data that hasn't seen before, we say that the model is overfitting.

The model in the picture is overfitting.

πŸ‘‡
1⃣ Train your model on more data

The more data you feed the model, the more likely it will start generalizing (instead of memorizing the training set.)

Look at the relationship between dataset size and error.

(Unfortunately, sometimes there's no more data.)

πŸ‘‡
2⃣ Augment your dataset

You can automatically augment your dataset by transforming existing images in different ways to make the data more diverse.

Some examples:

▫️Zoom in/out
▫️Contrast changes
▫️Horizontal/vertical flips
▫️Noise addition

πŸ‘‡
3⃣ Make your model simpler

You can:

▫️Reduce the number of layers
▫️Reduce the number of weights

The more complex your model is, the more capacity it has to memorize the dataset (hence, the easier it will overfit.)

Simplifying the model will force it to generalize.

πŸ‘‡
4⃣ Stop the learning process before overfitting

This is known as "Early Stopping."

Identify when overfitting starts happening and stop the learning process before it does.

Plotting the training and validation errors will give you what you need for this.

πŸ‘‡
5⃣ Standardize input data

Smaller weights can result in a model less prone to overfit.

Rescaling input data is a way to constraint these weights and keep them from increasing disproportionally.

πŸ‘‡
6⃣ Use Dropouts

Dropout is a regularization method that randomly ignores some of the outputs of a layer.

This simulates the process of training different neural networks with different architectures in parallel, which is a way to avoid overfitting.

machinelearningmastery.com/dropout-for-re…
7⃣ L1 and L2 regularization

These refer to a technique that penalizes the loss function to keep the weights of the network constrained.

This means that the network is forced to generalize better because it can't grow the weights without limit.

πŸ‘‡
Is there anything else you use to prevent your models from overfitting?

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Santiago πŸŽƒ

Santiago πŸŽƒ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

21 Oct
I always get Normalization and Standardization mixed up.

But they are different.

Notes about them and why do we care.

πŸ§΅πŸ‘‡ Image
Feature scaling is key for a lot of Machine Learning algorithms to work well.

We always want all of our data on the same scale.

πŸ‘‡
Imagine we are working with a dataset of workers.

"Age" will range between 16 and 90.
"Salary" will range between 15,000 and 150,000.

Huge disparity!

Salary will dominate any comparisons because of its magnitude.

We can fix that by scaling both features.
πŸ‘‡
Read 7 tweets
20 Oct
I'm a full-on AI proponent.

But I really don't like the idea of facial recognition software.

This is why.

πŸ§΅πŸ‘‡ Image
▫️It violates our right to privacy

Do you really want thousands of photos with your face stored in hundreds of databases all over the place?

Photos that will be automatically tagged with your personal information.

And you won't have any control over this.

πŸ‘‡
▫️Lack of regulations makes this scary.

Who will be able to use this? Do we have to give consent? Can we trust this? How is this information going to be used? With what purposes?

Are we gonna get tracked every time, everywhere?

πŸ‘‡
Read 9 tweets
18 Oct
Bias vs. variance in 13 charts.

πŸ§΅πŸ‘‡
Here is a sample 2-dimensional dataset.

(We are just representing here the training data.)

πŸ‘‡
The red line represents a model.

Let's call it "Model A."

A very simple model. Just a straight line.

πŸ‘‡
Read 15 tweets
17 Oct
Wanna maximize the potential reward of every hour you spend?

Here is a tangible way to do this when building real-life Machine Learning solutions.

πŸ§΅πŸ‘‡
Complex systems usually depend on multiple components working together to produce a solution.

Imagine a pipeline like this, where the input goes through 4 different components before getting to the appropriate output.

πŸ‘‡
After everything is said and done, let's imagine this system is correct 60% of the time.

That sucks. We need to improve it.

Unfortunately, we tend to prioritize work in those areas where we *think* there's value. Even worse, areas that are easy or fun to change.

πŸ‘‡
Read 9 tweets
14 Oct
Machine Learning 101:

▫️ Overfitting sucks ▫️

Here is what you need to know.

πŸ§΅πŸ‘‡
Overfitting is probably the most common problem when training a Machine Learning model (followed very close by underfitting.)

Overfitting means that your model didn't learn much, and instead, it's just memorizing stuff.

πŸ‘‡
Overfitting may be misleading: during training, it looks like your model learned awesomely well.

Look at the attached picture. It shows how the accuracy of a sample model increases as it's being trained.

The accuracy reaches close to 100%! That's awesome!

Or, is it?

πŸ‘‡
Read 9 tweets
13 Oct
Transfer Learning.

It sounds fancy because it is.

This is a thread about one of the most powerful tools that make possible that knuckleheads like me achieve state-of-the-art Deep Learning results on our laptops.

πŸ§΅πŸ‘‡
Deep Learning is all about "Deep" Neural Networks.

"Deep" means a lot of complexity. You can translate this to "We Need Very Complex Neural Networks." See the attached example.

The more complex a network is, the slower it is to train, and the more data we need to train it.

πŸ‘‡
To get state-of-the-art results when classifying images, we can use a network like ResNet50, for example.

It takes around 14 days to train this network with the "imagenet" dataset (1,300,000+ images.)

14 days!

That's assuming that you have a decent (very expensive) GPU.

πŸ‘‡
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!