Tweet

Jean de Nyandwi

19 Jul, 26 tweets, 6 min read

PCA is an unsupervised learning algorithm that is used to reduce the dimension of large datasets.

For such reason, it's commonly known as a dimensional reduction algorithm.

PCA is one of these useful things that is not talked about. But there is a reason 👇

The PCA's ability to reduce the dimension of the dataset motivates other use cases.

Below are some:

◆ To visualize high dimensional datasets, particularly because visualizing such datasets is impractical.

◆ To select the most useful features while getting rid of useless information/redundant features.

But not always, sometimes useful information will be lost too especially if the original data was already good and didn't contain noises.

◆ PCA can be used before a typical machine learning model training to merely increase the training speed, given that the training data is reduced or no longer contain redundant features.

It is not guaranteed to speed up but in some cases, it can.

In many ML resources, you will find PCA in the category of unsupervised learning algorithms.

Below is a simple reason 👇

PCA reduces the dimension of datasets without instructions (labels in other words) of how that is going to be done other than specifying the number of principal components, just like specifying the number of clusters in KMeans clustering.

This thread is practical. The above was only just a high-level overview.

Let's apply PCA on a dataset you may know/heard of, the red wine dataset.

We will get it from the Scikit-Learn dataset.

It contains 178 data points, 13 features, and 1 target feature having 3 classes(0,1,2).

For now,

Let's apply PCA to the red wine dataset. As a best practice, it's always good to scale the input data.

So, first, I will standardize the data, that is to rescale the values to have 0 mean and unit standard deviation.

Scikit-Learn will take care of that.

In order to reduce the dimension of the dataset, we have to specify the number of principal components.

Think of principal components as coordinates that we want to project the data in.

Or reduced features that hold the most information of the dataset.

Usually, you will choose 2 or 3 components, but in most cases, 2 will be enough.

Below I apply the PCA object and choose a number of principal components, and apply it to the data.

After we have applied the PCA to the dataset, you can see that the dimensions have reduced from 13 to 2.

As you can see below visualizing two 2 principal components, the 3 wine classes are well separated.

In only just 2 components. As you would guess, there wouldn't be another way to scatterplot the entire dataset.

There is one issue with PCA.

Interpreting the components is hard. You can see that the values of the components below but they say nothing.

One way to interpret the results is to use a heatmap to show how much of a given feature is represented in each particular component.

See below (but may not be clear enough)

We can also display the explained variance ratio to see the percentage of the dataset variance explained by each principal component.

pca.explained_variance_ will show the whole variance amount. Whereas pca.explained_variance_ratio_ will show the percentage variance.

The Explained Variance Ration in our case is [0.99809123, 0.00173592].

It means that 99.8% of the dataset variance lies on the first component, and the rest 0.17% lies on the second component.

If you look back to the heatmap above, on the y axis, these ratios can make sense.

As a bonus, let's also use PCA to visualize the digit datasets. It has 64 dimensions, each digit has 8*8 pixels.

We can use PCA to project those 64 dimensions into 2 components.

Like in the first example, we load the data, apply PCA, choosing 2 principal components, and later visualize the information contained into such principal components.

Here I load the data first.

Here I apply PCA to the digit data (not scaled)

And here I visualize the reduced digits.

This is so fantastic.

Imagine that we are able to visualize all 10 digits into one plot, just because we have reduced their dimensions from 64 to 2.

This is the end of the thread.

The thread was about PCA. There are a whole maths behind it, but a lot of time, having a high-level understanding of things like this is quite enough to make things work.

Here are the main key takeaways:

PCA is a dimensional reduction algorithm. It reduces dimensions of the dataset while also preserving as much information as possible into fewer components.

It can also be used to:

◆ Visualize large datasets
◆ Remove redundant features
◆ To speed up model training when applied to the input data (in some cases) before training.

@Jeande_d

Thank you for reading!

I am actively writing about machine learning techniques, concepts, and ideas.

You can support me by following @Jeande_d and sharing the first tweet with your friends who are interested in ML content.

More content to come 🙌🏻

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Jean de Nyandwi

Try unrolling a thread yourself!

More from @Jeande_d

Jean de Nyandwi

Did Thread Reader help you today?

Like this author's thread?