David Andrรฉs ๐Ÿค–๐Ÿ“ˆ๐Ÿ Profile picture
๐Ÿ“ˆ I summarise Machine Learning, NLP and Time Series concepts in an easy and visual way โ€ข ๐Ÿ’ŠFollow me in https://t.co/hy44Q0jFQs ๐Ÿ‘‰ Inquiries in david@mlpills.dev
3 subscribers
Jul 28, 2024 โ€ข 6 tweets โ€ข 1 min read
Introduction to some Advanced EDA Techniques ๐Ÿ‘‡ Image 1๏ธโƒฃ Dimensionality Reduction
For datasets with many variables, techniques like Principal Component Analysis (PCA) or t-SNE can help you visualize high-dimensional data in two or three dimensions.
Jul 27, 2024 โ€ข 13 tweets โ€ข 2 min read
EDA clearly explained ๐Ÿ‘‡ Image Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods.
Jul 21, 2024 โ€ข 7 tweets โ€ข 1 min read
Multi Query, an Advanced Retrieval Strategy for RAG, clearly explained ๐Ÿ‘‡ Image Multi Query is a powerful Query Translation technique to enhance information retrieval in AI systems.

It involves generating multiple variations of an original query to improve the chances of finding relevant information.
Jul 14, 2024 โ€ข 8 tweets โ€ข 1 min read
DBSCAN clearly explained ๐Ÿ‘‡ Image DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a powerful clustering algorithm.

It finds clusters of varying shapes and sizes while handling noise and outliers.
Jul 7, 2024 โ€ข 8 tweets โ€ข 1 min read
Linear Regression clearly explained ๐Ÿ‘‡ Image What is it?

Linear Regression is a statistical method for predicting the value of a continous dependent variable based on one or more independent variables. It estimates the relationship using a linear equation.
Jun 29, 2024 โ€ข 8 tweets โ€ข 2 min read
Retrieval Augmented Generation (RAG) for LLM systems clearly explained ๐Ÿ‘‡ Image RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.

It involves a five-step process ๐Ÿ‘‡
Jun 25, 2024 โ€ข 9 tweets โ€ข 2 min read
Support Vector Machines clearly explained๐Ÿ‘‡ Image Support Vector Machine is a useful Machine Learning algorithm frequently used for both classification and regression problems.

โญ this is a ๐˜€๐˜‚๐—ฝ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐˜€๐—ฒ๐—ฑ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฎ๐—น๐—ด๐—ผ๐—ฟ๐—ถ๐˜๐—ต๐—บ.

Basically, they need labels or targets to learn!
Jun 24, 2024 โ€ข 7 tweets โ€ข 1 min read
K-Nearest Neighbors clearly explained ๐Ÿ‘‡ Image KNN is a versatile supervised learning algorithm widely employed in both classification and regression tasks.

Unlike complex models, KNN relies on the proximity of data points to make predictions, making it intuitive and easy to implement.
Jun 23, 2024 โ€ข 9 tweets โ€ข 2 min read
Normal Distribution clearly explained ๐Ÿ‘‡ Image A normal distribution, also known as a Gaussian distribution, is the most typical distribution you'll find in your days features.

It's characterized by several key properties that give it a distinctive bell curve shape.
Jun 3, 2024 โ€ข 7 tweets โ€ข 2 min read
Where can you find the most common data distributions? (2nd part)

Check this thread for real-world examples! ๐Ÿงต ๐Ÿ‘‡ Image 5๏ธโƒฃ Exponential Distribution:

Often found in scenarios related to the time until an event occurs, such as the lifetime of an electronic component or the time until the next earthquake occurs.
May 29, 2024 โ€ข 7 tweets โ€ข 2 min read
In Data Science you can find multiple data distributions...

But where are they typically found? ๐Ÿค”

This is part 1 - tomorrow I'll share the second one!

Check it out ๐Ÿงต๐Ÿ‘‡ Image 1๏ธโƒฃ Normal Distribution:

Often found in natural and social phenomena where many factors contribute to an outcome. Examples include heights of adults in a population, test scores, measurement errors, and blood pressure readings.
Apr 27, 2024 โ€ข 8 tweets โ€ข 2 min read
I bet you've heard a lot about RAG recently...

๐Ÿ‘‰ But what is it?
๐Ÿ‘‰ And what does it consist of?

Find out more about this here ๐Ÿงต ๐Ÿ‘‡ Image RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.

It involves a five-step process ๐Ÿ‘‡
Apr 16, 2024 โ€ข 8 tweets โ€ข 2 min read
Vector databases are pivotal for Large Language Models (LLMs) due to their ability to handle high-dimensional vector data efficiently.

โ–ถ๏ธ They optimize storage, retrieval, and management of vector embeddings crucial for LLM performance.

Learn more ๐Ÿ‘‡ ๐Ÿงต Image โœฆ They empower similarity searches vital for LLMs in tasks like semantic search and recommendation systems.
By finding the most similar vector embeddings within large datasets, they aid in delivering more accurate results.
Apr 11, 2024 โ€ข 8 tweets โ€ข 2 min read
Do you want to build a Language Model Application?

Then, you need to try LangChain!

๐Ÿ‘‡ ๐Ÿงต Image LangChain offers different modules tailored for language model applications.

Whether you're crafting a simple app or a more complex system, these modules got you covered.
Apr 10, 2024 โ€ข 11 tweets โ€ข 2 min read
What is the difference between seasonality and cyclicality in time series forecastingโ“

Discover it below ๐Ÿ‘‡

๐Ÿงต Image Seasonality and cyclicality are two essential concepts that play a crucial role in understanding patterns within time series data.

Let's start with seasonality! ๐ŸŒž๐Ÿ‚โ„๏ธ
Apr 6, 2024 โ€ข 6 tweets โ€ข 2 min read
Have you ever wondered how ๐—ฆ๐˜‚๐—ฝ๐—ฝ๐—ผ๐—ฟ๐˜ ๐—ฉ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ๐˜€ (SVM) can handle non-linear data?

The "๐—ž๐—ฒ๐—ฟ๐—ป๐—ฒ๐—น ๐—ง๐—ฟ๐—ถ๐—ฐ๐—ธ" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results!

Let's learn more about it ๐Ÿงต ๐Ÿ‘‡ Image In SVM, the kernel trick is a clever way to perform complex calculations in a higher-dimensional feature space without explicitly transforming the original data into that space.

It's like finding a hidden pathway to handle non-linear relationships between data points.
Mar 11, 2024 โ€ข 10 tweets โ€ข 2 min read
In time series analysis, the trend component is key.

It indicates the directional movement of data over time.

Let's learn more about the trend ๐Ÿ‘‡๐Ÿงต Image The trend component represents the overall direction data moves over an extended period.

It captures systematic patterns like gradual increases, decreases, or stable periods. The trend reflects long-term shifts in the data.
Mar 3, 2024 โ€ข 6 tweets โ€ข 1 min read
Do you want to get the feature importance using SHAP?

Wait no more, here you have a code snippet ๐Ÿ‘‡

Check also the following posts, for additional info ( ๐Ÿงต ) Image Steps:

1. Import Libraries:
Import necessary libraries, including ๐šœ๐š‘๐šŠ๐š™ and ๐š๐šŠ๐š—๐š๐š˜๐š–๐™ต๐š˜๐š›๐šŽ๐šœ๐š๐š๐šŽ๐š๐š›๐šŽ๐šœ๐šœ๐š˜๐š›.

2. Model Training:
Initialize and train any model, for example a Random Forest Regressor model using ๐š‡_๐š๐š›๐šŠ๐š’๐š— (features) and ๐šข_๐š๐š›๐šŠ๐š’๐š— (labels).
Feb 25, 2024 โ€ข 9 tweets โ€ข 2 min read
๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—œ๐—บ๐—ฝ๐—ผ๐—ฟ๐˜๐—ฎ๐—ป๐—ฐ๐—ฒ measures the contribution of each feature to the model's predictions.

It is crucial in Machine Learning for several reasons.

Let's see them ๐Ÿงต๐Ÿ‘‡ Image 1๏ธโƒฃ Model Interpretability

Understanding which features are most important in making predictions helps to interpret the model's behavior.

It provides insights into the underlying patterns learned by the model and helps in building trust among stakeholders.
Feb 20, 2024 โ€ข 10 tweets โ€ข 2 min read
SHAP is a powerful technique in machine learning for interpreting the output of complex models.

Commonly used for โœจFeature Engineeringโœจ

Let's explore SHAP further ๐Ÿงต ๐Ÿ‘‡ Image It stands for โ€œSHapley Additive exPlanationsโ€.

It is a model agnostic method, which means that it can be applied to any model. This is particularly useful for black-box models where understanding how individual features contribute to predictions might otherwise be challenging.
Feb 14, 2024 โ€ข 8 tweets โ€ข 2 min read
ARIMA is one of the most popular traditional statistical methods used for time series forecasting.

Let's understand its components ๐Ÿงต ๐Ÿ‘‡ Image ARIMA stands for Auto-Regressive Integrated Moving Average.

It is composed of 3 components:

๐Ÿ”น Auto-Regressive (AR)
๐Ÿ”น Integrated (I)
๐Ÿ”น Moving Average (MA)