Yesterday we discussed the first way of forecasting with your Time Series model:
1๏ธโฃ The traditional way or multi-step forecast
Today is time for the second (and better) way:
2๏ธโฃ Rolling forecast
2๏ธโฃ Rolling forecast
As mentioned yesterday, this consists of training the model every day, with all the available data until the present day.
Then we forecast for tomorrow.
Let's continue with yesterday's example.
The data we used was the Apple stock price.
We assumed that today was 30/11/2021.
We split the data in two:
- Training: prices until "today"
- Testing: prices from "today"
- Training set to train the model.
- Testing set to evaluate the results, as this is the Actual price to match.
This will consider all the available data, which will significantly improve the predictions! ๐คฏ
NOTE: this data or model are not the best ones, so this model seems to kind of replicate the previous price. This was not the purpose of this thread, so we will not focus on that.
โถ๏ธ TL;DR
The rolling forecasting method is a much better way of evaluating your Time Series model.
The traditional method performs poorly as it does not consider all the available data.
Check yesterday's thread about the Traditional forecasting method ๐
1๏ธโฃ Dimensionality Reduction
For datasets with many variables, techniques like Principal Component Analysis (PCA) or t-SNE can help you visualize high-dimensional data in two or three dimensions.
2๏ธโฃ Clustering
Unsupervised learning techniques like K-means clustering can help identify natural groupings in your data that might not be apparent from simple visualizations.
Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods.
It is essential for understanding the data's underlying structure and characteristics before applying more formal statistical or Machine Learning methods.
Some key points that we should normally check are๐
Multi Query, an Advanced Retrieval Strategy for RAG, clearly explained ๐
Multi Query is a powerful Query Translation technique to enhance information retrieval in AI systems.
It involves generating multiple variations of an original query to improve the chances of finding relevant information.
How it works:
Instead of relying on a single query, Multi Query uses language models to create several rephrased versions of the original question. Each version captures different aspects or interpretations of the user's intent.
DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a powerful clustering algorithm.
It finds clusters of varying shapes and sizes while handling noise and outliers.
What is it?
DBSCAN is an unsupervised learning algorithm that groups together closely packed points and marks points in low-density regions as outliers.
Linear Regression is a statistical method for predicting the value of a continous dependent variable based on one or more independent variables. It estimates the relationship using a linear equation.
How it works:
โข Take input features
โข Calculate a weighted sum plus a bias term
โข Use the equation ( y = ฮฒโ + ฮฒโxโ + ฮฒโxโ + ... + ฮฒโxโ )
โข Minimize the error (usually Mean Squared Error)
Retrieval Augmented Generation (RAG) for LLM systems clearly explained ๐
RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.
It involves a five-step process ๐
1๏ธโฃ Data Collection
The first step is gathering all the data needed for the application - user manuals, databases, FAQs, etc. For a customer support chatbot, this could include product documentation, troubleshooting guides, and common inquiries.