Do you want to forecast seasonal time series data?
Remove the seasonality and add it back at the end! That's basically what STL method does.
STL stands for “Seasonal and Trend decomposition using LOESS”. It is a versatile and robust method for decomposing time series.
It uses LOESS (Locally Estimated Scatterplot Smoothing) instead of Moving Average to extract the seasonal component.
1️⃣ Decompose the Time Series:
Utilize STL to split the time series into three parts:
• trend
• seasonal component
• residual component
2️⃣ Deseasonalize:
Subtract the seasonal component from the main series, creating a deseasonalized version.
This will yield the trend with residuals, which are not just noise, they represent random or irregular fluctuations that are not captured by the trend or the seasonality.
3️⃣ Forecast Deseasonalized data:
Employ non-seasonal methods like ARIMA or Simple Exponential Smoothing to predict the trend and residuals in the deseasonalized data.
4️⃣ Forecast Seasonality:
STL predicts future seasons by repeating the last observed season; for instance, last year's seasonal pattern for monthly data.
5️⃣ Reapply Seasonal Component:
Finally, incorporate the forecasted seasonal data back to the deseasonalized forecast, restoring it to the original scale.
That's how you can forecast your time series data using STL.
You can learn more about this in the latest issue of MLPills💊
ARIMA models have three parameters: 'p', 'q' and 'd'.
They need to be optimized... but, before that, do you know how to interpret each of them?
Learn what each of them mean here 🧵 👇
ARIMA stands for Auto-Regressive Integrated Moving Average.
It is a statistical method used for time series forecasting, particularly in analyzing and predicting future values based on past observations, by capturing underlying trends and patterns in the data.
Let's see more 👇
🟢 d → order of differencing
Differencing is a method used to make a non-stationary time series stationary (remove trends and seasonality from a time series).
The ‘d’ parameter represents the number of times the data needs to be differenced to make it stationary.
Are you familiar with the most common Machine Learning algorithms?
Today, I will complete the Top 10 of the most commonly used ones!
Check them out 🧵 👇
7️⃣ Neural networks are composed of interconnected layers of artificial neurons that learn complex and nonlinear patterns from data by adjusting weights and biases through backpropagation.
Useful for solving a wide range of problems, such as image recognition, NLP...
8️⃣ Random forest combines multiple decision trees, each trained on a random subset of the data and features, and aggregates their predictions for classification or regression tasks.
Useful for achieving high accuracy and robustness, as well as reducing overfitting and variance.
You normally forecast the trend of your data, but there are cases in which the variance is also important.
The most common example is Finance, but there are other fields in which it is also relevant.
ARCH models are used for that!
Learn when they are useful 🧵👇
1️⃣ Economics:
In macroeconomics, ARCH/GARCH models can be employed to model and forecast the volatility of economic indicators or time series data.
They can be useful in studying the variability of inflation rates, interest rates, and other economic variables.
2️⃣ Environmental Science:
ARCH models can be applied to study and forecast volatility in environmental data, such as climate variables or pollution levels.
📄 Involves identifying and eliminating the data points that deviate significantly from the rest of the data.
🕑 When the outliers are suspected to be due to errors or anomalies and their removal doesn’t significantly reduce the sample size.
2️⃣ Imputation:
📄 Substituting the outliers in the target variable with more representative values.
🕑 When the outliers are in the variable to predict, and replacing them with the mean, median, or a model-predicted value would not distort the underlying data distribution.
Are you familiar with the most common Machine Learning algorithms?
Today, I introduce 6 of the most commonly used ones!
Check them out 🧵 👇
1️⃣ Linear regression predicts continuous values (e.g. sales, prices) by finding the best-fitting line between input and output variables.
Useful for understanding how input changes affect output.
2️⃣ Decision tree splits data into branches based on rules or criteria for classification or regression tasks. Each branch is a possible outcome or decision, and each leaf node is a final prediction.
Useful for visualizing and explaining logic behind predictions.