๐ I summarise Machine Learning, NLP and Time Series concepts in an easy and visual way โข ๐Follow me in https://t.co/hy44Q0jFQs
๐ Inquiries in david@mlpills.dev
3 subscribers
Jul 28, 2024 โข 6 tweets โข 1 min read
Introduction to some Advanced EDA Techniques ๐
1๏ธโฃ Dimensionality Reduction
For datasets with many variables, techniques like Principal Component Analysis (PCA) or t-SNE can help you visualize high-dimensional data in two or three dimensions.
Jul 27, 2024 โข 13 tweets โข 2 min read
EDA clearly explained ๐
Exploratory Data Analysis (EDA) is a process used for investigating your data to discover patterns, anomalies, relationships, or trends using statistical summaries and visual methods.
Jul 21, 2024 โข 7 tweets โข 1 min read
Multi Query, an Advanced Retrieval Strategy for RAG, clearly explained ๐
Multi Query is a powerful Query Translation technique to enhance information retrieval in AI systems.
It involves generating multiple variations of an original query to improve the chances of finding relevant information.
Jul 14, 2024 โข 8 tweets โข 1 min read
DBSCAN clearly explained ๐
DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a powerful clustering algorithm.
It finds clusters of varying shapes and sizes while handling noise and outliers.
Jul 7, 2024 โข 8 tweets โข 1 min read
Linear Regression clearly explained ๐
What is it?
Linear Regression is a statistical method for predicting the value of a continous dependent variable based on one or more independent variables. It estimates the relationship using a linear equation.
Jun 29, 2024 โข 8 tweets โข 2 min read
Retrieval Augmented Generation (RAG) for LLM systems clearly explained ๐
RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.
It involves a five-step process ๐
Jun 25, 2024 โข 9 tweets โข 2 min read
Support Vector Machines clearly explained๐
Support Vector Machine is a useful Machine Learning algorithm frequently used for both classification and regression problems.
โญ this is a ๐๐๐ฝ๐ฒ๐ฟ๐๐ถ๐๐ฒ๐ฑ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ฎ๐น๐ด๐ผ๐ฟ๐ถ๐๐ต๐บ.
Basically, they need labels or targets to learn!
Jun 24, 2024 โข 7 tweets โข 1 min read
K-Nearest Neighbors clearly explained ๐
KNN is a versatile supervised learning algorithm widely employed in both classification and regression tasks.
Unlike complex models, KNN relies on the proximity of data points to make predictions, making it intuitive and easy to implement.
Jun 23, 2024 โข 9 tweets โข 2 min read
Normal Distribution clearly explained ๐
A normal distribution, also known as a Gaussian distribution, is the most typical distribution you'll find in your days features.
It's characterized by several key properties that give it a distinctive bell curve shape.
Jun 3, 2024 โข 7 tweets โข 2 min read
Where can you find the most common data distributions? (2nd part)
Check this thread for real-world examples! ๐งต ๐
5๏ธโฃ Exponential Distribution:
Often found in scenarios related to the time until an event occurs, such as the lifetime of an electronic component or the time until the next earthquake occurs.
May 29, 2024 โข 7 tweets โข 2 min read
In Data Science you can find multiple data distributions...
But where are they typically found? ๐ค
This is part 1 - tomorrow I'll share the second one!
Check it out ๐งต๐
1๏ธโฃ Normal Distribution:
Often found in natural and social phenomena where many factors contribute to an outcome. Examples include heights of adults in a population, test scores, measurement errors, and blood pressure readings.
Apr 27, 2024 โข 8 tweets โข 2 min read
I bet you've heard a lot about RAG recently...
๐ But what is it?
๐ And what does it consist of?
Find out more about this here ๐งต ๐
RAG helps bridge the gap between large language models and external data sources, allowing AI systems to generate relevant and informed responses by leveraging knowledge from existing documents and databases.
It involves a five-step process ๐
Apr 16, 2024 โข 8 tweets โข 2 min read
Vector databases are pivotal for Large Language Models (LLMs) due to their ability to handle high-dimensional vector data efficiently.
โถ๏ธ They optimize storage, retrieval, and management of vector embeddings crucial for LLM performance.
Learn more ๐ ๐งต
โฆ They empower similarity searches vital for LLMs in tasks like semantic search and recommendation systems.
By finding the most similar vector embeddings within large datasets, they aid in delivering more accurate results.
Apr 11, 2024 โข 8 tweets โข 2 min read
Do you want to build a Language Model Application?
Then, you need to try LangChain!
๐ ๐งต
LangChain offers different modules tailored for language model applications.
Whether you're crafting a simple app or a more complex system, these modules got you covered.
Apr 10, 2024 โข 11 tweets โข 2 min read
What is the difference between seasonality and cyclicality in time series forecastingโ
Discover it below ๐
๐งต
Seasonality and cyclicality are two essential concepts that play a crucial role in understanding patterns within time series data.
Let's start with seasonality! ๐๐โ๏ธ
Apr 6, 2024 โข 6 tweets โข 2 min read
Have you ever wondered how ๐ฆ๐๐ฝ๐ฝ๐ผ๐ฟ๐ ๐ฉ๐ฒ๐ฐ๐๐ผ๐ฟ ๐ ๐ฎ๐ฐ๐ต๐ถ๐ป๐ฒ๐ (SVM) can handle non-linear data?
The "๐๐ฒ๐ฟ๐ป๐ฒ๐น ๐ง๐ฟ๐ถ๐ฐ๐ธ" is a fascinating mathematical technique that allows efficient calculations and delivers powerful results!
Let's learn more about it ๐งต ๐
In SVM, the kernel trick is a clever way to perform complex calculations in a higher-dimensional feature space without explicitly transforming the original data into that space.
It's like finding a hidden pathway to handle non-linear relationships between data points.
Mar 11, 2024 โข 10 tweets โข 2 min read
In time series analysis, the trend component is key.
It indicates the directional movement of data over time.
Let's learn more about the trend ๐๐งต
The trend component represents the overall direction data moves over an extended period.
It captures systematic patterns like gradual increases, decreases, or stable periods. The trend reflects long-term shifts in the data.
Mar 3, 2024 โข 6 tweets โข 1 min read
Do you want to get the feature importance using SHAP?
Wait no more, here you have a code snippet ๐
Check also the following posts, for additional info ( ๐งต )
Steps:
1. Import Libraries:
Import necessary libraries, including ๐๐๐๐ and ๐๐๐๐๐๐๐ต๐๐๐๐๐๐๐๐๐๐๐๐๐๐.
2. Model Training:
Initialize and train any model, for example a Random Forest Regressor model using ๐_๐๐๐๐๐ (features) and ๐ข_๐๐๐๐๐ (labels).
Feb 25, 2024 โข 9 tweets โข 2 min read
๐๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐๐บ๐ฝ๐ผ๐ฟ๐๐ฎ๐ป๐ฐ๐ฒ measures the contribution of each feature to the model's predictions.
It is crucial in Machine Learning for several reasons.
Let's see them ๐งต๐
1๏ธโฃ Model Interpretability
Understanding which features are most important in making predictions helps to interpret the model's behavior.
It provides insights into the underlying patterns learned by the model and helps in building trust among stakeholders.
Feb 20, 2024 โข 10 tweets โข 2 min read
SHAP is a powerful technique in machine learning for interpreting the output of complex models.
Commonly used for โจFeature Engineeringโจ
Let's explore SHAP further ๐งต ๐
It stands for โSHapley Additive exPlanationsโ.
It is a model agnostic method, which means that it can be applied to any model. This is particularly useful for black-box models where understanding how individual features contribute to predictions might otherwise be challenging.
Feb 14, 2024 โข 8 tweets โข 2 min read
ARIMA is one of the most popular traditional statistical methods used for time series forecasting.
Let's understand its components ๐งต ๐
ARIMA stands for Auto-Regressive Integrated Moving Average.
It is composed of 3 components:
๐น Auto-Regressive (AR)
๐น Integrated (I)
๐น Moving Average (MA)