David Andrés 🤖📈🐍 Profile picture
📈 I summarise Machine Learning and Time Series concepts in an easy and visual way • 💊Follow me in https://t.co/hy44Q0jFQs 👉 Inquiries in david@mlpills.dev
Mg. Ing. Ernesto C. R. DataۗScientist GWUniversity Profile picture Learning in Public - Coding - DataSci Profile picture 2 subscribed
Apr 16 8 tweets 2 min read
Vector databases are pivotal for Large Language Models (LLMs) due to their ability to handle high-dimensional vector data efficiently.

▶️ They optimize storage, retrieval, and management of vector embeddings crucial for LLM performance.

Learn more 👇 🧵 Image ✦ They empower similarity searches vital for LLMs in tasks like semantic search and recommendation systems.
By finding the most similar vector embeddings within large datasets, they aid in delivering more accurate results.
Apr 11 8 tweets 2 min read
Do you want to build a Language Model Application?

Then, you need to try LangChain!

👇 🧵 Image LangChain offers different modules tailored for language model applications.

Whether you're crafting a simple app or a more complex system, these modules got you covered.
Apr 10 11 tweets 2 min read
What is the difference between seasonality and cyclicality in time series forecasting❓

Discover it below 👇

🧵 Image Seasonality and cyclicality are two essential concepts that play a crucial role in understanding patterns within time series data.

Let's start with seasonality! 🌞🍂❄️
Apr 6 6 tweets 2 min read
Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data?

The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results!

Let's learn more about it 🧵 👇 Image In SVM, the kernel trick is a clever way to perform complex calculations in a higher-dimensional feature space without explicitly transforming the original data into that space.

It's like finding a hidden pathway to handle non-linear relationships between data points.
Mar 11 10 tweets 2 min read
In time series analysis, the trend component is key.

It indicates the directional movement of data over time.

Let's learn more about the trend 👇🧵 Image The trend component represents the overall direction data moves over an extended period.

It captures systematic patterns like gradual increases, decreases, or stable periods. The trend reflects long-term shifts in the data.
Mar 3 6 tweets 1 min read
Do you want to get the feature importance using SHAP?

Wait no more, here you have a code snippet 👇

Check also the following posts, for additional info ( 🧵 ) Image Steps:

1. Import Libraries:
Import necessary libraries, including 𝚜𝚑𝚊𝚙 and 𝚁𝚊𝚗𝚍𝚘𝚖𝙵𝚘𝚛𝚎𝚜𝚝𝚁𝚎𝚐𝚛𝚎𝚜𝚜𝚘𝚛.

2. Model Training:
Initialize and train any model, for example a Random Forest Regressor model using 𝚇_𝚝𝚛𝚊𝚒𝚗 (features) and 𝚢_𝚝𝚛𝚊𝚒𝚗 (labels).
Feb 25 9 tweets 2 min read
𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝗰𝗲 measures the contribution of each feature to the model's predictions.

It is crucial in Machine Learning for several reasons.

Let's see them 🧵👇 Image 1️⃣ Model Interpretability

Understanding which features are most important in making predictions helps to interpret the model's behavior.

It provides insights into the underlying patterns learned by the model and helps in building trust among stakeholders.
Feb 20 10 tweets 2 min read
SHAP is a powerful technique in machine learning for interpreting the output of complex models.

Commonly used for ✨Feature Engineering✨

Let's explore SHAP further 🧵 👇 Image It stands for “SHapley Additive exPlanations”.

It is a model agnostic method, which means that it can be applied to any model. This is particularly useful for black-box models where understanding how individual features contribute to predictions might otherwise be challenging.
Feb 14 8 tweets 2 min read
ARIMA is one of the most popular traditional statistical methods used for time series forecasting.

Let's understand its components 🧵 👇 Image ARIMA stands for Auto-Regressive Integrated Moving Average.

It is composed of 3 components:

🔹 Auto-Regressive (AR)
🔹 Integrated (I)
🔹 Moving Average (MA)
Feb 13 10 tweets 2 min read
What is the difference between Classification and Regression in Machine Learning? 🤔

🧵 👇 Image Both Regression and Classification involve predicting outcomes based on input data, however, they differ in terms of the nature of the target variable and the goals they aim to achieve.

Let's find out more...
Feb 4 7 tweets 2 min read
Permutation importance is a model-agnostic technique used to assess the importance of features in a model.

This method involves systematically shuffling each feature's values one at a time and measuring the resulting change in model performance. Image I said that it is a model-agnostic technique, but what does that mean?

It refers to a technique that is not tied to any specific ML model. It can be applied to various types of models without being dependent on the internal characteristics or algorithms of a particular model.
Feb 3 7 tweets 2 min read
Understanding feature importance in machine learning models is essential for interpreting their predictions.

Today I'll share with you 2 methods to get it 🧵 👇 Image 1️⃣ Machine learning models, like Linear Regression, assign coefficients to features for understanding their impact on predictions.

These coefficients reveal individual feature contributions, but their interpretation is straightforward only when features are independent.
Feb 1 10 tweets 2 min read
Doing feature engineering?

Then consider getting knowledge about the field your data is about!

Here an introduction and some examples! 🧵 👇 Image Domain specific features are features that are specific to the problem at hand and capture some aspects of the underlying phenomenon that are not directly observable from the raw data.
Jan 28 12 tweets 2 min read
How can you estimate the value of the MA term - q - in your ARIMA model?

Here you have a step-by-step guide! 🧵👇 Image 1️⃣ Ensure Stationarity:

Before choosing the order of the Moving Average (MA) term (q), ensure the time series is stationary.

This is crucial as the properties of non-stationary time series can change over time.
Jan 25 8 tweets 2 min read
Creating the right features for Time Series data can make a significant impact on the performance of your model.

Today I'll introduce 2 key ones, essential for capturing the sequential aspect of time series! 🧵👇 Image ▶️ Rolling Window Features:

Statistical measures (mean, median, standard deviation, etc.) computed over a sliding time window. Ideal for capturing local trends.

Useful when you want to capture local trends and patterns in your data.
Jan 24 10 tweets 2 min read
How can you estimate a suitable value for 'p' in your ARIMA model?

Here you have the definite guide! 🧵👇 Image 1️⃣ Ensure Stationarity:

Start by ensuring the time series data is stationary. This is a crucial step as non-stationary data can lead to unreliable predictions. Differencing can be used to achieve stationarity.
Jan 18 7 tweets 1 min read
Do you have univariate time series data?
Squeeze it to extract all the information out of it!

This is how: Imagine you have data on sales, you then must have the sales for each day.

You can get a lot of information from the date:
▶️ Year, month, day
▶️ Day of the week
▶️ Quarter of the year
▶️ Weekend?
...
Jan 16 12 tweets 3 min read
What scenarios can you find when your data is missing?

When data is missing, it's essential to understand the reason behind its absence.

Find out more 🧵 👇 Image The nature of missing data is typically categorized into 3 types:

1⃣ MCAR
2⃣ MAR
3⃣ MNAR

👇
Jan 15 8 tweets 2 min read
Does my data have a Unit Root?

What is that and why it is important in Time Series forecasting?

🧵👇 Image In the context of Time Series Analysis, a unit root refers to:

A value of the parameter in the Auto-Regressive (AR) model that is equal to one.
Jan 13 9 tweets 2 min read
Having an imbalanced dataset is a problem. 😟

Discover SMOTE, it can help you deal with this!

🧵 👇 Image But first, why imbalanced datasets are problematic? 🤔

The rare class (minority) gets overshadowed by the dominant class (majority), leading to skewed model performance → a big problem for classification models!
Jan 10 9 tweets 3 min read
Stationarity is a property of a Time Series where its statistical features such as mean and variance remain constant over time.

It's crucial for Time Series analysis because many statistical models assume stationarity for reliable forecasts.

Find out how to check it 🧵👇 Image 1️⃣ Augmented Dickey–Fuller (ADF) test:

The ADF test checks for a unit root, indicating non-stationarity.

🔎 The null hypothesis assumes a unit root, while the alternative hypothesis suggests stationarity. Image