Machine Learning Explained πŸ‘¨β€πŸ«

PCA

Principal Component Analysis is a commonly used method for dimensionality reduction.

It's a good example of how fairly complex math can have an intuitive explanation and be easy to use in practice.

Let's start from the application of PCA πŸ‘‡ Image
Dimensionality Reduction

This is one of the common uses of PCA in machine learning.

Imagine you want to predict house prices. You get a large table of many houses and different features for them like size, number of rooms, location, age, etc.

Some features seem correlated πŸ‘‡
Correlated features

For example, the size of the house is correlated with the number of rooms. Bigger houses tend to have more rooms.

Another example could be the age and the year the house was built - they give us pretty much the same information.

We don't want that πŸ‘‡
Curse of Dimensionality

In general, we want to have fewer features, because of the Curse of Dimensionality.

The amount of data required to fit a model increases exponentially with the number of features. Therefore, having many features telling us the same thing is bad.

πŸ‘‡
Remove features

Then, let's remove the features we don't need. But how do you select these?

Take a look at the code. Using scikit-learn and PCA you can take a dataset containing many features and transform it into a dataset with fewer features (5 in the example).

πŸ‘‡ Image
What you can use PCA for is to essentially create new features that will "compress" the data of the full feature set as well as possible.

In this way, redundant features will effectively be removed. This is what we call dimensionality reduction.

Why does it work πŸ‘‡
To understand better what happens, let's look at a specific example using the awesome visualization by @vicapow.

I'll post images and videos in this thread, but I encourage you to click the link and experiment yourself a bit.

setosa.io/ev/principal-c…

Let's start in 3D πŸ‘‡
@vicapow Take a look at this 3D model

Guess what, you are now doing dimensionality reduction!

You see, you are looking at a 3D model on a screen that is 2D. While every point in the model is defined by 3 features (coordinates), you only select 2 features to look at it.

πŸ‘‡
@vicapow As the model is rotating, you can observe that some views are better at displaying the data than others.

Look at the screenshots I did below. On the first one, you can clearly see that there are 3 clusters in the data, while on the second you only see 2.

Can PCA help? πŸ‘‡ ImageImage
@vicapow I wrote above that PCA will find a new reduced set of features that optimally encodes the full dataset.

So, if we apply it to the 3D model, we get the following view of the data, which clearly shows the 3 clusters. The optimal view!

πŸ‘‡ Image
@vicapow Now you can imagine that on the 3 dimensions you have features describing a house.

And the good thing is that PCA doesn't care how many dimensions you have and how many you want to reduce your dataset.

Now a bit about the math πŸ‘‡
@vicapow What happens is that PCA will rotate your dataset in a way that in the first dimension (first principal component) you have the highest variability of the data, on the second dimension - the second-highest, and so on.

The reduction happens by removing the last dimensions.

πŸ‘‡ Image
@vicapow Now, go back to the visualization and play with the 2D example.

Move the points in the left around and observe how the PCA plot changes. Put all points on a diagonal like (highly correlated) and you will see that after PCA the second dimension almost disappears.

πŸ‘‡ Image
@vicapow The math

You saw how you can use PCA in practice and got an intuition of why it works and you haven't seen a single formula!

To perform PCA you need to perform Singular Value Decomposition of the covariance matrix, but I won't go into detail here.

πŸ‘‡
@vicapow If you are interested in math let me know and I can write another thread.

However, I really want to stress that you can understand PCA and use it in practice even without understanding the internals of how it works.

πŸ‘‡
@vicapow I regularly write threads to explain complex concepts in machine learning and web3 in a simple manner.

Follow me @haltakov for more

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with haltakov.eth 🌍 πŸ‡ΊπŸ‡¦

haltakov.eth 🌍 πŸ‡ΊπŸ‡¦ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @haltakov

Mar 18
s this formula difficult? πŸ€”

This is the formula for Gradient Descent with Momentum as presented in Wikipedia.

It may look intimidating at first, but I promise you that by the end of this thread it will be easy to understand!

Thread πŸ‘‡

#RepostFriday
The Basis ◻️

Let's break it down! The basis is this simple formula describing an iterative optimization method.

We have some weights (parameters) and we iteratively update them in some way to reach a goal

Iterative methods are used when we cannot compute the solution directly
Gradient Decent Update πŸ“‰

We define a loss function describing how good our model is. We want to find the weights that minimize the loss (make the model better).

We compute the gradient of the loss and update the weights by a small amount (learning rate) against the gradient.
Read 8 tweets
Mar 16
Machine Learning Formulas Explained πŸ‘¨β€πŸ«

For regression problems you can use one of several loss functions:
β–ͺ️ MSE
β–ͺ️ MAE
β–ͺ️ Huber loss

But which one is best? When should you prefer one instead of the other?

Thread 🧡 Image
Let's first quickly recap what each of the loss functions does. After that, we can compare them and see the differences based on some examples.

πŸ‘‡
Mean Square Error (MSE)

For every sample, MSE takes the difference between the ground truth and the model's prediction and computes its square. Then, the average over all samples is computed.

For details, check out this thread:


πŸ‘‡
Read 20 tweets
Mar 11
Machine Learning in the Real World 🧠 πŸ€–

ML for real-world applications is much more than designing fancy networks and fine-tuning parameters.

In fact, you will spend most of your time curating a good dataset.

Let's go through the process together πŸ‘‡

#RepostFriday Image
Collect Data πŸ’½

We need to represent the real world as accurately as possible. If some situations are underrepresented we are introducing Sampling Bias.

Sampling Bias is nasty because we'll have high test accuracy, but our model will perform badly when deployed.

πŸ‘‡
Traffic Lights 🚦

Let's build a model to recognize traffic lights for a self-driving car. We need to collect data for different:

β–ͺ️ Lighting conditions
β–ͺ️ Weather conditions
β–ͺ️ Distances and viewpoints
β–ͺ️ Strange variants

And if we sample only 🚦 we won't detect πŸš₯ πŸ€·β€β™‚οΈ

πŸ‘‡ Image
Read 16 tweets
Mar 8
Machine Learning Formulas Explained πŸ‘¨β€πŸ«

This is the Huber loss - another complicated-looking formula...

Yet again, if you break it down and understand the individual, it becomes really easy.

Let me show you πŸ‘‡ Image
Background

The Huber loss is a loss function that is similar to the Mean Squared Error (MSE) but it is designed to be more robust to outliers.

MSE suffers from the problem that if there is a small number of severe outliers they can dominate the whole loss

How does it work? πŸ‘‡
The key to understanding math formulas is not to try to understand everything at the same time.

Try looking at the terms inside the formula. Try to understand them from the inside to the outside...

Here, we can quickly see that one term is repeated several times...

πŸ‘‡ Image
Read 13 tweets
Mar 4
Machine Learning Formulas Explained! πŸ‘¨β€πŸ«

This is the formula for the Binary Cross Entropy Loss. It is commonly used for binary classification problems.

It may look super confusing, but I promise you that it is actually quite simple!

Let's go step by step πŸ‘‡

#RepostFriday
The Cross-Entropy Loss function is one of the most used losses for classification problems. It tells us how well a machine learning model classifies a dataset compared to the ground truth labels.

The Binary Cross-Entropy Loss is a special case when we have only 2 classes.

πŸ‘‡
The most important part to understand is this one - this is the core of the whole formula!

Here, Y denotes the ground-truth label, while ΕΆ is the predicted probability of the classifier.

Let's look at a simple example before we talk about the logarithm... πŸ‘‡
Read 13 tweets
Mar 3
When machine learning met crypto art... they fell in love ❀️

The Decentralized Autonomous Artist (DAA) is a concept that is uniquely enabled by these technologies.

Meet my favorite DAA - Botto.

Let me tell you how it works πŸ‘‡
Botto uses a popular technique to create images - VQGAN+CLIP

In simple terms, it uses a neural network model generating images (VQCAN) guided by the powerful CLIP model which can relate images to text.

This method can create stunning visuals from a simple text prompt!

πŸ‘‡
Creating amazing images, though, requires finding the right text prompt

Botto is programmed by its creator - artist Mario Klingemann (@quasimondo), but it creates all art itself. There is no human intervention in the creation of the images!

Botto is trained by the community πŸ‘‡
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(