Feature Scaling is one of the most important transformations you need to apply to your data before feeding it to your machine learning model.
🧵thread 👇
- Feature scaling refers to the concept of normalizing the range of input features so that they have similar scales.
- Most machine learning algorithms do not work well when the input numerical attributes have very different scales.
1⃣Min-Max Scaling (Normalization)
- In this method, values are rescaled so they end up ranging between 0 and 1 (the range can be changed in some cases).
- Normalization is very sensitive to the existence of outliers.
2⃣Standardization (Z-Score Normalization)
- Centers the data by making the values in each feature have a mean of zero and a variance of one.
- Standardization is much less affected by outliers but it does not reduce its impact on the model.
3⃣Quantile Transformation
- Transform features using quantile information so that they follow a normal distribution.
- It uses non-linear transformations to spread out the most frequent values and reduce the impact of outliers.
For more content about machine learning and computer science, please follow me @ammar_yasser92.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Data Science learning roadmap
If you are looking to get into machine learning, I would like to make your life easier by listing some of my favorite resources in this thread 🧵👇 #MachineLearning
1⃣Python is the most common language for data science (even though there are other great languages like R, C++)
This playlist by @JustinMitchel is a great way to start. youtube.com/playlist?list=…
Bias/Variance analysis is one of the most important concepts in machine learning as it allows you to measure and improve the performance of your model.
🧵In this thread, I will discuss the basic idea behind it, so let's dive right in. #MachineLearning#ArtificialIntelligence
1⃣Bias (Underfitting)
What is bias?
In simple terms, bias is the difference between the model predictions and the actual values.
So if a model has a high bias, it means it has underfitted the training data. In other words, the model has a high training set error.
What causes a high bias?
Bias, or underfitting, mainly occurs when the model is too simple to understand the underlying structure of the training data, and so it misses the relevant relations between the input features and target outputs.