Yesterday we discussed the first way of forecasting with your Time Series model:
1️⃣ The traditional way or multi-step forecast
Today is time for the second (and better) way:
2️⃣ Rolling forecast
2️⃣ Rolling forecast
As mentioned yesterday, this consists of training the model every day, with all the available data until the present day.
Then we forecast for tomorrow.
Let's continue with yesterday's example.
The data we used was the Apple stock price.
We assumed that today was 30/11/2021.
We split the data in two:
- Training: prices until "today"
- Testing: prices from "today"
- Training set to train the model.
- Testing set to evaluate the results, as this is the Actual price to match.
This will consider all the available data, which will significantly improve the predictions! 🤯
NOTE: this data or model are not the best ones, so this model seems to kind of replicate the previous price. This was not the purpose of this thread, so we will not focus on that.
▶️ TL;DR
The rolling forecasting method is a much better way of evaluating your Time Series model.
The traditional method performs poorly as it does not consider all the available data.
Check yesterday's thread about the Traditional forecasting method 👇
Podcasts are a goldmine of interconnected knowledge. But how to model it?
I built a pipeline to turn transcripts into queryable Knowledge Graphs, transforming hours of audio into a structured, explorable network.
Here’s the technical breakdown 🧵👇
The core is knowledge extraction using LangChain's LLMGraphTransformer with gpt-4o.
The LLM reads the transcript and returns a structured list of nodes (e.g., "Insulin Resistance") and edges (e.g., "REDUCES"), automating semantic relationship discovery.
A relational DB would struggle. Knowledge is a graph, so I use a native graph database: @neo4j Aura.
Nodes & edges are loaded directly, preserving structure. Multi-hop queries like (A)-[:CAUSES]->(B) are trivial—no expensive JOINs. Seamless via the langchain-neo4j library.
CNNs learn through hierarchical feature extraction: each layer builds on the one before. This structure is what makes them so powerful for vision tasks.
Let's break it down 👇🧵
🟢 Early layers focus on low-level features extracted directly from pixel intensities.
These include:
• Edges
• Lines
• Curves
• Textures
They form the foundation for all further recognition.
🟠 Middle layers combine low-level patterns into more complex structures.
This is where the network begins to recognize:
• Shapes
• Motifs
• Patterns
• Parts of objects
1️⃣ Dimensionality Reduction
For datasets with many variables, techniques like Principal Component Analysis (PCA) or t-SNE can help you visualize high-dimensional data in two or three dimensions.
2️⃣ Clustering
Unsupervised learning techniques like K-means clustering can help identify natural groupings in your data that might not be apparent from simple visualizations.
Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods.
It is essential for understanding the data's underlying structure and characteristics before applying more formal statistical or Machine Learning methods.
Some key points that we should normally check are👇
Multi Query, an Advanced Retrieval Strategy for RAG, clearly explained 👇
Multi Query is a powerful Query Translation technique to enhance information retrieval in AI systems.
It involves generating multiple variations of an original query to improve the chances of finding relevant information.
How it works:
Instead of relying on a single query, Multi Query uses language models to create several rephrased versions of the original question. Each version captures different aspects or interpretations of the user's intent.
DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a powerful clustering algorithm.
It finds clusters of varying shapes and sizes while handling noise and outliers.
What is it?
DBSCAN is an unsupervised learning algorithm that groups together closely packed points and marks points in low-density regions as outliers.