Levi Profile picture
Mar 5 7 tweets 3 min read Read on X
Correlation and covariance are so similar.

But there is one really important difference.

Let me explain it: Image
Covariance

The scale of the covariance depends on the scale of the data. It can take any value. No restrictions.

It's hard to compare different datasets with covariance for the reason above.

It's mainly used in Finance and Economics. Image
Correlation

It is a standardized measure of covariance. The values range from -1 to 1 every time, no matter the scale of data.

This makes it a powerful tool since we can compare correlations for different datasets.

It's easier to understand, so it's used everywhere. Image
Look at this example:

The covariance between weight and height (in meters) is 0.157, and the correlation is 0.706 (strong positive correlation).

What if we convert m to cm?

Covariance becomes 15.7 - changed with scale.

But the correlation remains the same, 0.706. Image
To conclude

Both of them are used to measure the relationship between variables.

The only difference is their scale. While the scale of covariance is not uniform, the correlation is easily comperable.
Did you like this post?

Hit that follow button for me and pay back with your support.

It literally takes 1 second for you but makes me 10x happier.

Thanks 😉
If you haven't already, join our newsletter DSBoost.

We share:

• Podcast notes
• Learning resources
• Interesting collections of content

dsboost.dev

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Levi

Levi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @levikul09

Mar 2
Standardization vs Normalization.

What is the difference? Image
Normalization rescales the values into a range of [0,1].

It is also known as Min-Max scaling.

It is useful if you want different datasets to be on the same positive scale.

It also creates a boundary for values. Image
Consider this:

You analyze the stock market.

One stock might range from $10 to $50 over a year, while another could range from $100 to $500.

Without normalization the scales are different.

If you normalize the prices the relative performance can be compared easily.
Read 8 tweets
Feb 26
Are you familiar with the different distributions used in statistics?

I will share the basics now.

From Bernoulli to Power-law, there are many to explore!

1/9 Image
1️⃣ Bernoulli

It describes the outcomes of binary events, where there are only two possible outcomes.

P represents the probability of observing value one.

For example: Customer will purchase or not?

2/9 Image
2️⃣ Binomial

Models the number of successes in a fixed number of independent trials, each with the same probability of success.

For example: Rolling a die.

P represents the probability of a given value.

N represents the number of variables.

3/9 Image
Read 9 tweets
Feb 25
How standardization affects your models?

🧵 Image
We want to group people based on their height and weight.

The problem is that the way we measure these metrics is different:

- We use kg for weight and the samples are between 50kgs and 150kgs.

- For height we use meters, ranging from 1.50m to 2.00m.
As you can see the scaling is different.

The range for weight is 100, while only 0.5 for height.

When we try to group people (with distance-based models), the weight differences matter way more than height.

That is because of the difference in range.
Read 9 tweets
Feb 23
When you start learning Statistics you may feel there are a million things to memorize.

You're wrong.

What you need is 5-6 basic concepts. Everything else is built around them.

Here are 6 Statistical principles to begin with:
1. Central tendency

Mean, mode, and median are measures that offer quick insights into the 'center' of the data.

It helps to see what a "normal" data point looks like in a group.

We hear about these measures a lot, so it's good to know them well. Image
2. Data visualization

Visuals are not just fancy but they are super useful.

They make the complex understandable.

Basic tools like histograms, bar graphs, and scatter plots help with interpretation and communication of data. Image
Read 9 tweets
Feb 19
We use 2 symbols for the mean. μ (mu) and x̄ (x-bar).

Here is why:
μ (mu) represents the population mean.

This is the average of the values of the entire population.

For example, if you want to calculate the population's average income in America, you need answers from all Americans. Image
x̄ (x-bar) represents the sample mean.

This is the average of the values in a sample taken from the population.

For example, this can be the average of 1000 Americans randomly selected from the population. Image
Read 6 tweets
Feb 18
Data Leakageis more dangerous than Overfitting.

Why?

There is a big difference between the two.

If you understand this you will realize. 🔽

1/10 Image
Let's define both concepts:

Overfitting

The model learns the training data too well. It also learns the noise and the outliers. These are only present in the training data, so with unseen data the performance will be poor.

2/10
Performance on Training Data: From the definition we see that the performance on training data is really high.

Performance on Test Data: Test data should work as unseen data. If we properly separate it, the overfitted model will perform poorly.

3/10
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(