Topic - Encode Numerical Features ( Binning & Binarization )
A Thread 🧵
Discretization: It is process of transforming continuous variables into categorical variable by creating set of intervals, which are contiguous, that span over the range of the variable’s values. It is also known as “Binning”, where the bin is an analogous name for an interval
Benefits of Discretization or Binning :
1⃣ Handles the Outliers in a better way.
2⃣ Improves the value spread.
3⃣ Minimize the effects of small
observation errors.
Types of Binning:
(a) Unsupervised Binning:
1⃣Equal width binning: It is also known as “Uniform Binning” since the width of all the intervals is the same. The algorithm divides the data into N intervals of equal size
2⃣ Equal frequency binning: It is also known as “Quantile Binning”. The algorithm divides the data into N groups where each group contains approximately the same number of values.
3⃣ K-means binning: This technique uses the clustering algorithm namely ” K-Means Algorithm”.
This technique is mostly used when our data is in the form of clusters.
(b) Custom binning: It is also known as “Domain” based binning. In this technique, you have domain knowledge about your business problem statement and by using your knowledge you have to do your custom binning.
▶️Binarization: It is a special case of Binning Technique. In this technique, we convert the continuous value into binary format i.e, in either 0 or 1.
Very useful Technique in Image Processing, for converting a colored image into a black and white image.
Topic - Handling Mixed Variable in Feature Engineering 👨💻
A Thread 🧵
Handling missing Variable is very important as many machine learning algorithms do not support data with missing values. If you have missing values in the dataset, it can cause errors and poor performance with some machine learning algorithms.
Variable deletion involves dropping variables (columns) with missing values on a case-by-case basis. This method makes sense when there are a lot of missing values in a variable and if the variable is of relatively less importance.
🔸Power Transformation techniques are the type of feature transformation technique where the power is applied to the data observations for transforming the data.
🔸Two types of Power Transformation techniques:
1⃣ Box-Cox Transform
2⃣ Yeo-Johnson Transform
▶️Box-Cox Transform :
This is mainly used for transforming the data observation by applying power to them. The power of data observation is denoted by Lambda(λ). There are mainly 2⃣ conditions associated with power in this transform which is lambda equal zero and not equal to0⃣
World is changing and AI is changing the way we work. Some websites which can help you in saving time and making an amazing resume with good ATS score, writing 10X faster blog posts.
📌 thisresumedoesnotexist – 1000 examples (ChatGPT famous resumes)
🔗lnkd.in/dkp95Ye9