I always get Normalization and Standardization mixed up.
But they are different.
Notes about them and why do we care.
π§΅π
Feature scaling is key for a lot of Machine Learning algorithms to work well.
We always want all of our data on the same scale.
π
Imagine we are working with a dataset of workers.
"Age" will range between 16 and 90.
"Salary" will range between 15,000 and 150,000.
Huge disparity!
Salary will dominate any comparisons because of its magnitude.
We can fix that by scaling both features.
π
We can normalize or standardize these features.
The former will squeeze values within 0 and 1.
The latter will center values around the mean with a unit standard deviation.
π
Here are some handwritten notes about Normalization.
π
Here are some handwritten notes about Standardization.
π
Which one is better?
Well, it depends.
There's really no way to determine which of these two methods works better without actually trying them (Thanks to @javaloyML for pointing this to me.)
It's a good idea to treat these as tunable settings that we can experiment with.
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
Here are 7 ways you can deal with overfitting in Deep Learning neural networks.
π§΅π
A quick reminder:
When your model makes good predictions on the same data that was used to train it but shows poor results with data that hasn't seen before, we say that the model is overfitting.
The model in the picture is overfitting.
π
1β£ Train your model on more data
The more data you feed the model, the more likely it will start generalizing (instead of memorizing the training set.)
Look at the relationship between dataset size and error.
This is a thread about one of the most powerful tools that make possible that knuckleheads like me achieve state-of-the-art Deep Learning results on our laptops.
π§΅π
Deep Learning is all about "Deep" Neural Networks.
"Deep" means a lot of complexity. You can translate this to "We Need Very Complex Neural Networks." See the attached example.
The more complex a network is, the slower it is to train, and the more data we need to train it.
π
To get state-of-the-art results when classifying images, we can use a network like ResNet50, for example.
It takes around 14 days to train this network with the "imagenet" dataset (1,300,000+ images.)
14 days!
That's assuming that you have a decent (very expensive) GPU.