The most basic technique is ๐๐ฎ๐ฏ๐ฒ๐น ๐๐ป๐ฐ๐ผ๐ฑ๐ถ๐ป๐ด.
Find out more about it ๐งต ๐
In this technique, each unique category is mapped to an integer starting from 0.
It does not assume any relationship of order or magnitude between the categories โ categories are numbered arbitrarily.
๐ง When should you use it?
It is best suited for ordinal data where the order matters but can be used for nominal data when the algorithm can handle it correctly (e.g., decision trees).
You want to forecast the price of the EUR/USD pair.
You could definitely use the price during the previous days, but what if you could improve that? ๐ค
Discover what else you can use here ๐งต ๐
0๏ธโฃ As mentioned, we could and should use the previous prices of this currency pair. Of course, we should convert it to Log Returns first (see my article in the last tweet) to make them stationary.
But that's not sufficient, many other variables influence EUR/USD price.
The EUR/USD price is your endogenous variable.
These additional variables, also called exogenous, can help you make more accurate forecasts ๐
Find out more about another feature scaling technique:
โจStandard Scaling or Z-score Normalizationโจ
๐งต ๐
In this case, features are scaled so that they have the properties of a standard normal distribution with mean ฮผ=0 and standard deviation ฯ=1.
๐งUse it when the algorithm assumes that the distribution of your features is Gaussian.
This method is also useful as a general technique when you don't know the distribution of your feature and you're not particularly concerned about robustness to outliers.
Discover one of the most used feature scaling techniques:
โจMin-Max Scalingโจ
๐งต ๐
This is the simplest form of normalization.
๐ The idea is to scale the range of each feature (like age, salary, etc.) so that they all fit within a specific range, usually between 0 and 1. This can make it easier for machine learning algorithms to learn from the data.
๐ง Use it when the distribution of the feature is not Gaussian and you need values in a bounded interval. However, this method is sensitive to outliers.