Neural networks are hard to train. The more they go deep, the more they are likely to suffer from unstable gradients.
A thread 🧵🧵
Gradients can either explode or vanish, and neither of those is a good thing for the training of our network.
The vanishing gradients problem results in the network taking too long to train(learning will be very slow), and the exploding gradients cause the gradients to be very large.
Although those problems are nearly inevitable, the choice of activation function can reduce their effects.
Using ReLU activation in the first layers can help avoid vanishing gradients.
That is also the reason why we do not like to see sigmoid activation being used in the first layers of the network because it can cause the gradients to vanish quickly.
Careful weight initialization can also help, but ReLU is by far the good fix.
This short thread was only about the high-level understanding of the issue. If you would like to learn more, you can read this stats discussion
Precision: What is the percentage of positive predictions that are actually positive?
Recall: What is the percentage of actual positives that were predicted correctly?
The fewer false positives, the higher the precision. Vice-versa.
The fewer false negatives, the higher the recall. Vice-versa.
How do you increase precision? Reduce false positives.
It can depend on the problem, but generally, that might mean fixing the labels of those negative samples(being predicted as positives) or adding more of them in the training data.
â—†Data visualization with Matplotlib & Seaborn
â—†Data preprocessing with Pandas
â—†Classical machine learning with Scikit-Learn: From linear models, trees, ensemble models to PCA
â—†Neural networks with TensorFlow & Keras: ConvNets, RNNs, BERT, etc...