Topic -- Outlier Detection using Percentile Method
A Thread 🧵
Outliers are a very important and crucial aspect of Data Analysis.
It can be treated in different ways, such as trimming, capping, discretization, or by treating them as missing values.
Percentile Method -
This technique works by setting a particular threshold value, which is decided based on our problem statement.
While we remove the outliers using capping, then that particular method is known as Winsorization.
Here, we always maintain symmetry on both sides, meaning if we remove 1% from the right, the left will also drop by 1%.
as outliers can significantly impact the accuracy of a model if they are not handled properly.
Benefit of removing outliers is to enhance the accuracy and stability of statistical models and ML algorithms by reducing their impact on results. Outliers can distort statistical analyses and skew results as they are extreme values that differ from the rest of the data.
Removing outliers makes the results more robust and accurate by eliminating their influence. It reduces overfitting in ML algorithms by avoiding fitting to extreme values instead of the underlying data pattern. @AnalyticsVidhya
🎯Are NULL values same as that of zero or a blank space❓
🔺A NULL value is not at all same as that of zero or a blank space.
🔺NULL value represents a value which is unavailable, unknown, assigned or not applicable whereas a zero is a number and blank space is a character.
🎯What is the usage of the NVL() function❓
🔹Answer
🔺You may use NVL function to replace null values with a default value. 🔺The function returns the value of second parameter if first parameter is null.
🔺If the first parameter is anything other than null, it is left alone
Topic -- Outlier Detection & Removal using Z-score Method
A Thread 🧵
The Z-score method is statistical approach used for detecting & removing outlier in dataset. An outlier is observation that lies far away from other observation in dataset. Such observations can significantly affect statistical properties of dataset & lead to erroneous conclusion
Approach for Outliers
- The very first step will be setting the upper and lower limit
- The first technique for dealing with outliers is trimming & this is regardless of what kind of data distribution you are working with, trimming is an applicable and proven technique for most
Topic - Handling Mixed Variable in Feature Engineering 👨💻
A Thread 🧵
Handling missing Variable is very important as many machine learning algorithms do not support data with missing values. If you have missing values in the dataset, it can cause errors and poor performance with some machine learning algorithms.
Variable deletion involves dropping variables (columns) with missing values on a case-by-case basis. This method makes sense when there are a lot of missing values in a variable and if the variable is of relatively less importance.