David Andrés 🤖📈🐍 Profile picture
📈 I summarise Machine Learning, NLP and Time Series concepts in an easy and visual way • 💊Follow me in https://t.co/hy44Q0jFQs 👉 Inquiries in david@mlpills.dev
Mg. Ing. Ernesto C. R. DataۗScientist GWUniversity Profile picture Learning in Public - Coding - DataSci Profile picture Gregory Pyatt Profile picture 3 subscribed
Jul 21 7 tweets 1 min read
Multi Query, an Advanced Retrieval Strategy for RAG, clearly explained 👇 Image Multi Query is a powerful Query Translation technique to enhance information retrieval in AI systems.

It involves generating multiple variations of an original query to improve the chances of finding relevant information.
Jul 14 8 tweets 1 min read
DBSCAN clearly explained 👇 Image DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a powerful clustering algorithm.

It finds clusters of varying shapes and sizes while handling noise and outliers.
Jul 7 8 tweets 1 min read
Linear Regression clearly explained 👇 Image What is it?

Linear Regression is a statistical method for predicting the value of a continous dependent variable based on one or more independent variables. It estimates the relationship using a linear equation.
Jun 29 8 tweets 2 min read
Retrieval Augmented Generation (RAG) for LLM systems clearly explained 👇 Image RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.

It involves a five-step process 👇
Jun 25 9 tweets 2 min read
Support Vector Machines clearly explained👇 Image Support Vector Machine is a useful Machine Learning algorithm frequently used for both classification and regression problems.

⭐ this is a 𝘀𝘂𝗽𝗲𝗿𝘃𝗶𝘀𝗲𝗱 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺.

Basically, they need labels or targets to learn!
Jun 24 7 tweets 1 min read
K-Nearest Neighbors clearly explained 👇 Image KNN is a versatile supervised learning algorithm widely employed in both classification and regression tasks.

Unlike complex models, KNN relies on the proximity of data points to make predictions, making it intuitive and easy to implement.
Jun 23 9 tweets 2 min read
Normal Distribution clearly explained 👇 Image A normal distribution, also known as a Gaussian distribution, is the most typical distribution you'll find in your days features.

It's characterized by several key properties that give it a distinctive bell curve shape.
Jun 3 7 tweets 2 min read
Where can you find the most common data distributions? (2nd part)

Check this thread for real-world examples! 🧵 👇 Image 5️⃣ Exponential Distribution:

Often found in scenarios related to the time until an event occurs, such as the lifetime of an electronic component or the time until the next earthquake occurs.
May 29 7 tweets 2 min read
In Data Science you can find multiple data distributions...

But where are they typically found? 🤔

This is part 1 - tomorrow I'll share the second one!

Check it out 🧵👇 Image 1️⃣ Normal Distribution:

Often found in natural and social phenomena where many factors contribute to an outcome. Examples include heights of adults in a population, test scores, measurement errors, and blood pressure readings.
Apr 27 8 tweets 2 min read
I bet you've heard a lot about RAG recently...

👉 But what is it?
👉 And what does it consist of?

Find out more about this here 🧵 👇 Image RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.

It involves a five-step process 👇
Apr 16 8 tweets 2 min read
Vector databases are pivotal for Large Language Models (LLMs) due to their ability to handle high-dimensional vector data efficiently.

▶️ They optimize storage, retrieval, and management of vector embeddings crucial for LLM performance.

Learn more 👇 🧵 Image ✦ They empower similarity searches vital for LLMs in tasks like semantic search and recommendation systems.
By finding the most similar vector embeddings within large datasets, they aid in delivering more accurate results.
Apr 11 8 tweets 2 min read
Do you want to build a Language Model Application?

Then, you need to try LangChain!

👇 🧵 Image LangChain offers different modules tailored for language model applications.

Whether you're crafting a simple app or a more complex system, these modules got you covered.
Apr 10 11 tweets 2 min read
What is the difference between seasonality and cyclicality in time series forecasting❓

Discover it below 👇

🧵 Image Seasonality and cyclicality are two essential concepts that play a crucial role in understanding patterns within time series data.

Let's start with seasonality! 🌞🍂❄️
Apr 6 6 tweets 2 min read
Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data?

The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results!

Let's learn more about it 🧵 👇 Image In SVM, the kernel trick is a clever way to perform complex calculations in a higher-dimensional feature space without explicitly transforming the original data into that space.

It's like finding a hidden pathway to handle non-linear relationships between data points.
Mar 11 10 tweets 2 min read
In time series analysis, the trend component is key.

It indicates the directional movement of data over time.

Let's learn more about the trend 👇🧵 Image The trend component represents the overall direction data moves over an extended period.

It captures systematic patterns like gradual increases, decreases, or stable periods. The trend reflects long-term shifts in the data.
Mar 3 6 tweets 1 min read
Do you want to get the feature importance using SHAP?

Wait no more, here you have a code snippet 👇

Check also the following posts, for additional info ( 🧵 ) Image Steps:

1. Import Libraries:
Import necessary libraries, including 𝚜𝚑𝚊𝚙 and 𝚁𝚊𝚗𝚍𝚘𝚖𝙵𝚘𝚛𝚎𝚜𝚝𝚁𝚎𝚐𝚛𝚎𝚜𝚜𝚘𝚛.

2. Model Training:
Initialize and train any model, for example a Random Forest Regressor model using 𝚇_𝚝𝚛𝚊𝚒𝚗 (features) and 𝚢_𝚝𝚛𝚊𝚒𝚗 (labels).
Feb 25 9 tweets 2 min read
𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝗰𝗲 measures the contribution of each feature to the model's predictions.

It is crucial in Machine Learning for several reasons.

Let's see them 🧵👇 Image 1️⃣ Model Interpretability

Understanding which features are most important in making predictions helps to interpret the model's behavior.

It provides insights into the underlying patterns learned by the model and helps in building trust among stakeholders.
Feb 20 10 tweets 2 min read
SHAP is a powerful technique in machine learning for interpreting the output of complex models.

Commonly used for ✨Feature Engineering✨

Let's explore SHAP further 🧵 👇 Image It stands for “SHapley Additive exPlanations”.

It is a model agnostic method, which means that it can be applied to any model. This is particularly useful for black-box models where understanding how individual features contribute to predictions might otherwise be challenging.
Feb 14 8 tweets 2 min read
ARIMA is one of the most popular traditional statistical methods used for time series forecasting.

Let's understand its components 🧵 👇 Image ARIMA stands for Auto-Regressive Integrated Moving Average.

It is composed of 3 components:

🔹 Auto-Regressive (AR)
🔹 Integrated (I)
🔹 Moving Average (MA)
Feb 13 10 tweets 2 min read
What is the difference between Classification and Regression in Machine Learning? 🤔

🧵 👇 Image Both Regression and Classification involve predicting outcomes based on input data, however, they differ in terms of the nature of the target variable and the goals they aim to achieve.

Let's find out more...
Feb 4 7 tweets 2 min read
Permutation importance is a model-agnostic technique used to assess the importance of features in a model.

This method involves systematically shuffling each feature's values one at a time and measuring the resulting change in model performance. Image I said that it is a model-agnostic technique, but what does that mean?

It refers to a technique that is not tied to any specific ML model. It can be applied to various types of models without being dependent on the internal characteristics or algorithms of a particular model.