David Andrés 🤖📈🐍's Threads

Jun 10 • 6 tweets • 2 min read

CNNs learn through hierarchical feature extraction: each layer builds on the one before. This structure is what makes them so powerful for vision tasks.

Let's break it down 👇🧵

🟢 Early layers focus on low-level features extracted directly from pixel intensities.

These include:
• Edges
• Lines
• Curves
• Textures

They form the foundation for all further recognition.

Jul 28, 2024 • 6 tweets • 1 min read

Introduction to some Advanced EDA Techniques 👇

1️⃣ Dimensionality Reduction
For datasets with many variables, techniques like Principal Component Analysis (PCA) or t-SNE can help you visualize high-dimensional data in two or three dimensions.

Jul 27, 2024 • 13 tweets • 2 min read

EDA clearly explained 👇

Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods.

Jul 21, 2024 • 7 tweets • 1 min read

Multi Query, an Advanced Retrieval Strategy for RAG, clearly explained 👇

Multi Query is a powerful Query Translation technique to enhance information retrieval in AI systems.

It involves generating multiple variations of an original query to improve the chances of finding relevant information.

Jul 14, 2024 • 8 tweets • 1 min read

DBSCAN clearly explained 👇

DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a powerful clustering algorithm.

It finds clusters of varying shapes and sizes while handling noise and outliers.

Jul 7, 2024 • 8 tweets • 1 min read

Linear Regression clearly explained 👇

What is it?

Linear Regression is a statistical method for predicting the value of a continous dependent variable based on one or more independent variables. It estimates the relationship using a linear equation.

Jun 29, 2024 • 8 tweets • 2 min read

Retrieval Augmented Generation (RAG) for LLM systems clearly explained 👇

RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.

It involves a five-step process 👇

Jun 25, 2024 • 9 tweets • 2 min read

Support Vector Machines clearly explained👇

Support Vector Machine is a useful Machine Learning algorithm frequently used for both classification and regression problems.

⭐ this is a 𝘀𝘂𝗽𝗲𝗿𝘃𝗶𝘀𝗲𝗱 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗮𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺.

Basically, they need labels or targets to learn!

Jun 24, 2024 • 7 tweets • 1 min read

K-Nearest Neighbors clearly explained 👇

KNN is a versatile supervised learning algorithm widely employed in both classification and regression tasks.

Unlike complex models, KNN relies on the proximity of data points to make predictions, making it intuitive and easy to implement.

Jun 23, 2024 • 9 tweets • 2 min read

Normal Distribution clearly explained 👇

A normal distribution, also known as a Gaussian distribution, is the most typical distribution you'll find in your days features.

It's characterized by several key properties that give it a distinctive bell curve shape.

Jun 3, 2024 • 7 tweets • 2 min read

Where can you find the most common data distributions? (2nd part)

Check this thread for real-world examples! 🧵 👇

5️⃣ Exponential Distribution:

Often found in scenarios related to the time until an event occurs, such as the lifetime of an electronic component or the time until the next earthquake occurs.

May 29, 2024 • 7 tweets • 2 min read

In Data Science you can find multiple data distributions...

But where are they typically found? 🤔

This is part 1 - tomorrow I'll share the second one!

Check it out 🧵👇

1️⃣ Normal Distribution:

Often found in natural and social phenomena where many factors contribute to an outcome. Examples include heights of adults in a population, test scores, measurement errors, and blood pressure readings.

Apr 27, 2024 • 8 tweets • 2 min read

I bet you've heard a lot about RAG recently...

👉 But what is it?
👉 And what does it consist of?

Find out more about this here 🧵 👇

Apr 16, 2024 • 8 tweets • 2 min read

Vector databases are pivotal for Large Language Models (LLMs) due to their ability to handle high-dimensional vector data efficiently.

▶️ They optimize storage, retrieval, and management of vector embeddings crucial for LLM performance.

Learn more 👇 🧵

✦ They empower similarity searches vital for LLMs in tasks like semantic search and recommendation systems.
By finding the most similar vector embeddings within large datasets, they aid in delivering more accurate results.

Apr 11, 2024 • 8 tweets • 2 min read

Do you want to build a Language Model Application?

Then, you need to try LangChain!

👇 🧵

LangChain offers different modules tailored for language model applications.

Whether you're crafting a simple app or a more complex system, these modules got you covered.

Apr 10, 2024 • 11 tweets • 2 min read

What is the difference between seasonality and cyclicality in time series forecasting❓

Discover it below 👇

🧵

Seasonality and cyclicality are two essential concepts that play a crucial role in understanding patterns within time series data.

Let's start with seasonality! 🌞🍂❄️

Apr 6, 2024 • 6 tweets • 2 min read

Have you ever wondered how 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (SVM) can handle non-linear data?

The "𝗞𝗲𝗿𝗻𝗲𝗹 𝗧𝗿𝗶𝗰𝗸" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results!

Let's learn more about it 🧵 👇

In SVM, the kernel trick is a clever way to perform complex calculations in a higher-dimensional feature space without explicitly transforming the original data into that space.

It's like finding a hidden pathway to handle non-linear relationships between data points.

Mar 11, 2024 • 10 tweets • 2 min read

In time series analysis, the trend component is key.

It indicates the directional movement of data over time.

Let's learn more about the trend 👇🧵

The trend component represents the overall direction data moves over an extended period.

It captures systematic patterns like gradual increases, decreases, or stable periods. The trend reflects long-term shifts in the data.

Mar 3, 2024 • 6 tweets • 1 min read

Do you want to get the feature importance using SHAP?

Wait no more, here you have a code snippet 👇

Check also the following posts, for additional info ( 🧵 )

Steps:

1. Import Libraries:
Import necessary libraries, including 𝚜𝚑𝚊𝚙 and 𝚁𝚊𝚗𝚍𝚘𝚖𝙵𝚘𝚛𝚎𝚜𝚝𝚁𝚎𝚐𝚛𝚎𝚜𝚜𝚘𝚛.

2. Model Training:
Initialize and train any model, for example a Random Forest Regressor model using 𝚇_𝚝𝚛𝚊𝚒𝚗 (features) and 𝚢_𝚝𝚛𝚊𝚒𝚗 (labels).

Feb 25, 2024 • 9 tweets • 2 min read

𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝗰𝗲 measures the contribution of each feature to the model's predictions.

It is crucial in Machine Learning for several reasons.

Let's see them 🧵👇

1️⃣ Model Interpretability

Understanding which features are most important in making predictions helps to interpret the model's behavior.

It provides insights into the underlying patterns learned by the model and helps in building trust among stakeholders.

Feb 20, 2024 • 10 tweets • 2 min read

SHAP is a powerful technique in machine learning for interpreting the output of complex models.

Commonly used for ✨Feature Engineering✨

Let's explore SHAP further 🧵 👇

It stands for “SHapley Additive exPlanations”.

It is a model agnostic method, which means that it can be applied to any model. This is particularly useful for black-box models where understanding how individual features contribute to predictions might otherwise be challenging.

Share this page!

Enter URL or ID to Unroll