Levi Profile picture
Mar 17 โ€ข 9 tweets โ€ข 3 min read
๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ฎ๐—ฟ๐—ฒ ๐—ป๐˜‚๐—บ๐—ฒ๐—ฟ๐—ถ๐—ฐ๐—ฎ๐—น ๐˜ƒ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ๐˜€ ๐—ด๐—ฟ๐—ผ๐˜‚๐—ฝ๐—ฒ๐—ฑ ๐—ฏ๐˜† ๐—ฐ๐—ฎ๐˜๐—ฒ๐—ด๐—ผ๐—ฟ๐—ถ๐—ฐ๐—ฎ๐—น ๐˜ƒ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ฏ๐—น๐—ฒ๐˜€?

I will show two simple methods.

1. Box plots
2. Violin plots

We will create this ๐Ÿ”ฝ in this Thread
The dataset will be @kaggle's Fifa 22 players.

Our categories are the Positions.

Here are the details of the data we are using:
1. Box plots

Pandas can create a really simple box plot using the .boxplot() method.

You just need to specify the category you are grouping by and the column with your numeric values:
Advantage of box plots: The box plot clearly shows the outliers in the data. They are visualized separately.

The disadvantage of box plots: We cannot see the density of the values on the Y-axis.

To correct this issue we will use violin plots.
2. Violin plots

They can plot the density on the y-axis.

The density is mirrored and flipped over, and the resulting shape is filled in, creating an image resembling a violin.
For the exercise, we are using seaborn's .violinplot():

We need to specify the data, X and Y-axis, and set the title.

This is what we got:
Advantage of violin plots: The violin plot clearly shows the density of the data.

The disadvantage of violin plots: Hard to see the outliers since they are included in the violin.
Combining the 2 methods we can get great info about our data.

It turned out that skill is not so important for Goalkeepers, but there are some outliers with higher skills, and the data is pretty dense at around 20.
That's it for today.

Follow me @levikul09 for more.

Like/Retweet the first tweet below for support, Thanks ๐Ÿ˜‰

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Levi

Levi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @levikul09

Mar 18
We all know the Pythagorean theorem.

But did you know that this is the engine in KNN?

Let me explain Euclidean distance.

๐Ÿงต

1/7 Image
The goal of the KNN algorithm is to find the closest points to a new value.

To find the closest point we need the distance.

This is where Euclidean distance comes in.

It is an easy way to measure the distance between two points.

I will show how it works. ๐Ÿ‘‡

2/7
We have two points: A and B.

These points have coordinates.

A = (2,5)

B = (4,8)

Now let's see the math.

3/7 Image
Read 7 tweets
Mar 16
A simple but effective ML technique โ“

K-Nearest Neighbors (KNN) algorithm.

It can be applied to a variety of real-world problems.

Let me explain how it works. ๐Ÿงต

1/7 Image
Let's look at the data below.

It has 3 categories:

- Red

- Purple

- Green

The data is already clustered.

2/7 Image
Now new data comes in, so we do not know the category.

We need to classify this point.

How do we do that?

We use the already classified data, the 'neighbors'

3/7 Image
Read 8 tweets
Mar 15
Conditional probabilities clearly explained with some apples and watermelons!

๐Ÿงต
We ask 20 people if they like ๐ŸŽ and/or ๐Ÿ‰

4 like ๐ŸŽ & ๐Ÿ‰

6 like ๐ŸŽ but no ๐Ÿ‰

8 like ๐Ÿ‰ but no ๐ŸŽ

2 don't like ๐ŸŽ or ๐Ÿ‰

We can insert these numbers into a contingency table ๐Ÿ”ฝ

1/7
Using the numbers, we can calculate some simple probabilities.

What is the probability that someone likes both fruits?

4 people from the 20 like both ๐ŸŽ & ๐Ÿ‰

So 4/20 = 1/5 = 0.2

The probability of choosing 1 person from the 20 who likes both fruits is 20%

2/7
Read 8 tweets
Mar 13
The misusage of statistical methods is a severe issue.

Avoid these in your analysis:

1/10
1๏ธโƒฃ Failing to report effect sizes

Report:

The risk of stroke is doubled for Vaccine A than for B.

Reality:

Risk for A - 2 in 1 million.

Risk for B - 1 in 1 million.

Yes, it is doubled, but it is still relatively low.

2/10
2๏ธโƒฃ Using too small or not representative sample

Go for proper randomization during the sample selection.

Obtain a great amount of data for representative samples.

3/10
Read 10 tweets
Mar 12
The math behind Bayes' Theorem clearly explained!

๐Ÿงต
In this ๐Ÿงต We will calculate with the numbers from the disease example.

You can find the example here:



1/8
For better understanding, we will create a contingency table with our numbers:

2/8
Read 9 tweets
Mar 11
Master Linear Regression with these Threads ๐Ÿ”ฝ
2๏ธโƒฃ The math behind Linear Regression:



2/7
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(