Feature selection is a crucial part of building a good machine learning model.
But most data scientists don't think before they select features.
The fact is: feature selection in machine learning is not always necessary.
Here are 5 situation when you don't need it ππ½π§΅
1. You have a small dataset that doesn't have many features.
If the data you're using is small and doesn't have many features, you don't need to do feature selection.
2. The features are already carefully selected
If the features you're using have already been carefully chosen and are important for the task you are trying to do, you don't need to do feature selection.
The number one cause of machine learning model failure is data set drift.
Yet most data scientists and machine learning practitioners don't know why their data sets are drifting.
Here are 6 of the most common reasons for data set drift in machine learning ππ½π§΅
What is dataset drift? It's when the statistical properties of a dataset change over time, which can negatively impact the performance of a machine learning model.
1. Changes in the data distribution:
The distribution of the data used to train the model may change over time, leading to dataset drift. This could be due to changes in the underlying process that generates the data, or due to changes in the data collection process itself.