If you just focus on the left side, it seems to make sense.
The training loss going down, the validation loss going up.
Clearly, seems to be an overfitting problem? Right?
But the graphs on the right don't seem to make sense in terms of overfitting.
The training accuracy is high, which is fine, but why is that validation accuracy is going up if the validation loss is getting worse, shouldn't it go down too?
Is it still overfitting?
YES!
Now, why is the validation graph going upwards? Let's understand.
Loss tracks the inverse confidence of the prediction. A high loss indicates that even though the model is making correct predictions it is less sure of the predictions it makes and vice-versa.
So, High Val Accuracy + High Loss Score vs High Train Accuracy + Low Train Loss suggests the model may be overfitting on train data.
Suppose there are 2 classes- dog & cat.
For our case, the correct class is a dog [1,0]
If the output of the softmax is [0.9, 0.1]
Loss would be ~0.105.
Classification-dog
If softmax output is [0.6, 0.4].
Loss would be ~0.511
Classification is same but loss has increased.
When both accuracy and loss are increasing, the network is starting to learn patterns only relevant for the training set and not great for generalization, some images from the validation set get predicted really wrong, with an effect amplified by the "loss asymmetry".
However, it is at the same time still learning some patterns which are useful for generalization as more and more images are being correctly classified! Interesting right!
When I first encountered this situation in my project, it left me puzzled. These were the only logical explanations I could find.
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
I had never seriously read a research paper π before and I certainly didn't plan to write one, until I had to.
But I ended up finishing one that got accepted in a conference, it wasn't revolutionary but I was glad that I decided to do it and was able to finish
Here's how:π
I was lucky to get past the first barrier quickly, choosing a subject or topic of research.
I was exposed to an image processing problem during my internship, which I really liked so I ended up pursuing the same for my research.
But if you're lost about the topic or what to choose, I suggest you check out the most recent papers, and see what interests you and move forward with that.
You are looking to get into Machine Learning? You most certainly can
Because I believe that if an above-average student like me was able to do it, you all certainly can as well
Here's how I went from knowing nothing about programming to someone working in Data Scienceπ
The path that I took wasn't the most optimal way to get a good grip on Machine Learning because...
when I started out, I knew nobody that worked or had knowledge of Data Science which made me try all sorts of different things that were not actually necessary.
I studied C programming as my first language during my freshman year in college. And before the start of my second year, I started learning python just because I knew C is not the way to go.
I learned it out of curiosity and I had no idea about Machine Learning at this point.
Learning rate is one of the most important parameter in Machine Learning Algorithms.π
You must have seen learning rates something like 0.01, 0.001, 0.0001....
In other words, always in the logarithmic scale. Why?
What happens if we just take random values between 0 and 1?
If we take random values between 0 and 1, we would have a probability of only 10% to get the values between 0 an 0.1, rest 90% of the values would be between 0.1 and 1.
Here are the links for all the notes that I have from the Andrew NG Machine Learning Course that I made back in 2016
This was my first exposure to #MachineLearning They helped me a lot and I hope anyone who's just starting out and prefers handwritten notes can reference these π
Gradient Descent is great but there are a whole bunch of problems associated with it.
Getting stuck in the local minima while browsing the solution space is one of the major issues.
A possible Solution?
SIMULATED ANNEALING
Here's a little something about it π§΅π
The method of Simulated Annealing in Optimization is analogical to the process of Annealing in Metallurgy βοΈπ₯, hence the name.
We get stuck in the local minima because we tend to always accept a solution that seems best in shortsight. We just move in the downwards direction β¬οΈ (negative gradient) and not upwardsβ¬οΈ
So once we reach a point which is low but not the lowest, we may end up getting stuck.