Naina Chaturvedi Profile picture
Oct 22, 2023 39 tweets 11 min read Read on X
✅Feature selection and Feature scaling are crucial Feature Engineering steps - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Feature selection is the process of choosing a subset of the most relevant features (variables or columns) from your dataset. It involves excluding less informative or redundant features to improve model performance and reduce computational complexity. Image
2/ When to Use It:
High-Dimensional Data: Feature selection is crucial when you have a high-dimensional dataset, meaning there are many features compared to the number of data points. High dimensionality can lead to overfitting and increased computational costs. Image
3/ Multicollinearity: When features in your dataset are highly correlated (multicollinearity), it's challenging for models to differentiate their individual effects. Feature selection helps address this issue by choosing the most informative features. Image
4/ Improve Model Efficiency: In cases where you have a limited amount of computational resources or time, reducing the number of features through selection can lead to faster model training and evaluation. Image
5/ Why to Use It:
Improved Model Performance: Feature selection can lead to models with better generalization and reduced overfitting, as they focus on the most relevant information.
6/ Feature scaling (or feature normalization) is process of standardizing or scaling values of different features to bring them to a common scale or distribution. It ensures that magnitudes of features do not disproportionately influence certain machine learning algorithms. Image
7/ When to Use It:
Model Sensitivity to Feature Magnitudes: Some machine learning models, like k-Nearest Neighbors (KNN) and Support Vector Machines (SVM), are sensitive to magnitudes of features. In such cases, feature scaling is critical to make algorithm work effectively. Image
8/ Comparing Features on Different Scales: When different features have different scales or units (e.g., age in years vs. income in thousands of dollars), feature scaling helps make features comparable. Image
9/ Why to Use It:
Improved Model Performance: Feature scaling ensures that all features contribute equally to the model's predictions. Without scaling, features with larger magnitudes can dominate the learning process.
10/ Faster Convergence: In algorithms like gradient-based optimization, feature scaling can lead to faster convergence, as it helps the optimization process.
11/ Types of Feature Selection:

Filter methods involve evaluating the relevance of features based on their intrinsic properties, such as statistical measures or correlation with the target variable. These methods do not consider the machine learning model's performance. Image
12/ Wrapper methods evaluate feature subsets by training and testing a machine learning model with each subset. These methods are computationally expensive but can provide the best feature subset for a specific model. Image
13/ Embedded methods incorporate feature selection into the model training process. Feature importance is determined as the model is being built, and less important features are pruned. Image
14/ Filter methods -
Correlation measures the linear relationship between two numerical variables. In feature selection, you can compute the correlation between each feature and the target variable. Features with a high correlation with the target are considered relevant. Image
15/ The chi-squared test assesses the statistical dependence between two categorical variables. In feature selection, you can use the chi-squared test to measure the independence of each categorical feature from the target variable. Image
16/ Mutual information measures the amount of information shared between two variables. In feature selection, you can compute the mutual information between each feature and the target variable. Features with high mutual information are considered relevant. Image
17/ Wrapper Methods:

RFE is an iterative method that starts with all features and removes the least important feature in each iteration based on model performance. The process continues until the desired number of features is achieved or until performance deteriorates. Image
18/ Forward and backward selection methods are stepwise feature selection techniques. Forward selection starts with an empty set of features and adds one feature at a time based on model performance. Backward selection begins with all features and removes one feature at a time. Image
19/ Embedded Methods:

L1 regularization adds a penalty term to the linear regression or logistic regression model's cost function, which encourages the model to minimize the absolute values of the feature coefficients. Image
20/ Tree-based methods like Random Forest can assess feature importance while building the model. Features that contribute the most to reducing impurity (e.g., Gini impurity) or increasing information gain are considered important. Image
21/ Feature Importance Scores:

Feature importance scores are a valuable tool for understanding the impact of individual features on a machine learning model's predictions. These scores can be obtained from tree-based models like Random Forest, XGBoost, and LightGBM. Image
22/ Common Feature Scaling Techniques:

Standardization (Z-score scaling): This method scales features to have a mean of 0 and a standard deviation of 1. It preserves the shape of the original distribution. Image
23/ Min-Max Scaling: This method scales features to a specific range, often [0, 1] or [-1, 1]. It shifts and stretches the feature values to fit within the desired range. Image
24/ Robust scaling is a technique that scales features based on the median and the interquartile range (IQR). It's more robust to outliers because it uses the median and quartiles rather than the mean and standard deviation. Image
25/ When to Scale Features:

Distance-Based Algorithms: Algorithms that use distances between data points are sensitive to feature scaling. For example, KNN) calculates distances between data points to make predictions. Image
26/ Gradient Descent-Based Algorithms: Machine learning algorithms that use gradient descent for optimization, such as linear regression and support vector machines, are generally more effective when features are scaled. Scaling speeds up convergence.
27/ Algorithms That Typically Benefit from Feature Scaling:

k-Nearest Neighbors (k-NN): This algorithm relies on the Euclidean distance between data points. Feature scaling ensures that all features have a similar impact on the distance calculations.
28/ Linear Regression: Feature scaling helps in linear regression by improving the convergence of gradient descent, leading to faster and more accurate model training.
29/ Support Vector Machines (SVM): SVM aims to find a hyperplane that maximizes the margin between classes. Feature scaling is crucial for SVM to prevent any single feature from dominating the margin calculation.
30/ Algorithms That Don't Require Feature Scaling:

Decision Trees and Random Forests: Tree-based algorithms make splits in the feature space based on feature values. These algorithms are not sensitive to feature scaling and can handle unscaled data effectively.
31/ How Scaling Can Be Affected by Outliers:

Outliers are data points that deviate significantly from majority of data in a dataset. When scaling features, outliers can have impact on scaling transformation, especially in techniques like Min-Max Scaling and Standardization. Image
32/ Standardization (Z-score Scaling):

Mean-Centering: Standardization centers the data by subtracting the mean (average) of each feature from the data points. This results in a mean of 0 for each feature. Image
33/ Unit Variance: It scales the features to have a standard deviation of 1, which means that the variance of each feature is equal.

Outlier Tolerance: Standardization is less sensitive to outliers because it uses the mean and standard deviation.
34/Use Cases: Standardization is suitable when data follows a normal distribution (or approximately normal) and when the scale and spread of features are important. It is commonly used for linear models, support vector machines, and principal component analysis (PCA).
35/ Min-Max Scaling (Normalization):

Rescaling to a Range: Min-Max Scaling scales features to a specific range, typically [0, 1]. It linearly transforms the data so that the minimum value becomes 0, and the maximum value becomes 1. Image
36/ Outlier Sensitivity: Min-Max Scaling can be sensitive to outliers, as it is influenced by the range of data values.

Use Cases: Min-Max Scaling is suitable when data doesn't follow a normal distribution and when absolute values & relationships between features are important.
37/ Log-Scaling for Skewed Data:

A scaling technique used for data with highly skewed distributions. When data is heavily skewed, with a long tail towards higher values, applying log-scaling can help make the data more symmetrical and reduce the impact of extreme values. Image
38/ Subscribe and Read more -
Github -
naina0405.substack.com
github.com/Coder-World04/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Naina Chaturvedi

Naina Chaturvedi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @NainaChaturved8

Dec 16, 2023
✅Attention Mechanism in Transformers- Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Attention mechanism calculates attention scores between all pairs of tokens in a sequence. These scores are then used to compute weighted representations of each token based on its relationship with other tokens in the sequence. Image
2/ This process generates context-aware representations for each token, allowing the model to consider both the token's own information and information from other tokens.
Read 23 tweets
Nov 13, 2023
✅Regularization is a technique used in ML to prevent overfitting and improve the generalization of a model - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Regularization is a technique in machine learning used to prevent overfitting by adding a penalty term to the model's loss function. The penalty discourages overly complex models and promotes simpler ones, improving generalization to new, unseen data. Image
2/ When to use regularization:

Use regularization when you suspect that your model is overfitting the training data.
Use it when dealing with high-dimensional datasets where the number of features is comparable to or greater than the number of samples.
Read 18 tweets
Nov 9, 2023
✅XGBoost is a powerful and efficient gradient boosting library designed for ML tasks, specifically for supervised learning problems- Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ XGBoost is ensemble learning method that combines multiple decision trees into a strong predictive model. It builds decision trees sequentially, where each tree corrects errors of previous ones. XGBoost optimizes a differentiable loss function to minimize prediction errors. Image
2/ When to Use XGBoost:

Use XGBoost when you need a highly accurate predictive model, especially in situations where other algorithms may struggle with complex patterns and relationships in the data. Image
Read 25 tweets
Nov 9, 2023
✅Gradient Boosting is a powerful machine learning technique used for both regression and classification tasks - Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Gradient Boosting is an ensemble learning method that combines the predictions of multiple weak learners (often decision trees) to create a stronger and more accurate predictive model.
2/ How Gradient Boosting Works:

Gradient Boosting builds an ensemble of decision trees sequentially. It starts with a simple model (typically a single tree) and then iteratively adds more trees to correct the errors made by the previous ones. Image
Read 25 tweets
Nov 6, 2023
✅Cross-validation in ML is particularly useful for estimating how well a model will perform on unseen data - Explained in Simple terms.
A quick thread 🧵👇🏻
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Cross-validation involves splitting the dataset into multiple subsets and using different parts of the data for training and testing at each iteration. The primary goal of cross-validation is to obtain a more robust and unbiased estimate of a model's performance. Image
2/ Why use Cross Validation -
Performance Estimation: Cross-validation provides a more robust and unbiased estimate of a model's performance. It helps you to obtain a more accurate assessment of how well your model will perform on new, unseen data. Image
Read 25 tweets
Oct 21, 2023
✅Feature Engineering is a critical aspect of ML that involves creating, selecting, and transforming features to improve model performance - Explained in Simple terms.
A quick thread 👇🏻🧵
#MachineLearning #Coding #100DaysofCode #deeplearning #DataScience
PC : Research Gate Image
1/ Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves selecting, transforming, and creating features from the raw data to make it more suitable for model training. Image
2/ When to Use Feature Engineering:

When Data Is Insufficient: Feature engineering can help when the available data is insufficient to solve the problem. By creating relevant features, you can provide more information to the model. Image
Read 25 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(