Learning rate is one of the most important parameter in Machine Learning Algorithms.📈
You must have seen learning rates something like 0.01, 0.001, 0.0001....
In other words, always in the logarithmic scale. Why?
What happens if we just take random values between 0 and 1?
If we take random values between 0 and 1, we would have a probability of only 10% to get the values between 0 an 0.1, rest 90% of the values would be between 0.1 and 1.
But why do we want between 0 and 0.1?
Because the preference is to find the correct scale not necessarily the number, optionally we can later perform a fine-grained search in the suitable range.
We want to test all orders of magnitude which can be done using a log-scale. So we define our search space somewhat like:
Using log-scale is a common practice for selecting hyper-parameters but it usually depends on the type of parameter.
One common exception which uses linear scale is Dropout!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You are looking to get into Machine Learning? You most certainly can
Because I believe that if an above-average student like me was able to do it, you all certainly can as well
Here's how I went from knowing nothing about programming to someone working in Data Science👇
The path that I took wasn't the most optimal way to get a good grip on Machine Learning because...
when I started out, I knew nobody that worked or had knowledge of Data Science which made me try all sorts of different things that were not actually necessary.
I studied C programming as my first language during my freshman year in college. And before the start of my second year, I started learning python just because I knew C is not the way to go.
I learned it out of curiosity and I had no idea about Machine Learning at this point.
Here are the links for all the notes that I have from the Andrew NG Machine Learning Course that I made back in 2016
This was my first exposure to #MachineLearning They helped me a lot and I hope anyone who's just starting out and prefers handwritten notes can reference these 👇
Gradient Descent is great but there are a whole bunch of problems associated with it.
Getting stuck in the local minima while browsing the solution space is one of the major issues.
A possible Solution?
SIMULATED ANNEALING
Here's a little something about it 🧵👇
The method of Simulated Annealing in Optimization is analogical to the process of Annealing in Metallurgy ⚗️🔥, hence the name.
We get stuck in the local minima because we tend to always accept a solution that seems best in shortsight. We just move in the downwards direction ⬇️ (negative gradient) and not upwards⬆️
So once we reach a point which is low but not the lowest, we may end up getting stuck.