Source of errors in building traditional programs:
◆Wrong syntaxes
◆Inefficient codes
Source of errors in machine learning:
◆Solving a wrong problem
◆Using a wrong evaluation metric
◆Not being aware of a skewed data
◆Inconsistent data preprocessing functions
More sources of errors in ML:
◆Putting too much emphasis on models than data
◆Data leakage
◆Training on the test data
◆Model and data drifts
The difference between errors in traditional software and machine learning-powered applications:
The errors in traditional systems are obvious but not always the case in the latter. In machine learning, you may not get any errors but still, get misleading predictions.
How to minimize errors in machine learning:
◆Use the right metric relevant to the problem
◆Use consistent data preprocessing functions between training and test set
◆Get the data right before the model
◆Do not leak the test data to the model
◆Avoid data leakage in general
◆Monitor the predictions and set up automatic actions(ex: retraining) in case of data and mode drifts.
Thank you for reading.
If you found this post helpful, retweet and follow @Jeande_d for more machine learning ideas.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The machine learning research community is very and very vibrant.
Here is what I mean...🧵🧵
In 1958, Frank Rosenblatt invented a perceptron, a very simple algorithm that would later turn out to be the core and origin of to days intelligent machines.
In essence, the perceptron is a simple binary classifier that can determine whether or not a given input belongs to a specific class.
Precision: What is the percentage of positive predictions that are actually positive?
Recall: What is the percentage of actual positives that were predicted correctly?
The fewer false positives, the higher the precision. Vice-versa.
The fewer false negatives, the higher the recall. Vice-versa.
How do you increase precision? Reduce false positives.
It can depend on the problem, but generally, that might mean fixing the labels of those negative samples(being predicted as positives) or adding more of them in the training data.