Latest Twitter Threads by @levikul09 on Thread Reader App

Apr 20, 2024 • 7 tweets • 3 min read

A surprising statistical result 🔽

You have tested positive for a disease.

- The test is 99% accurate.

- 1 out of 10,000 people has the disease.

What is the probability that you truly have the disease, given that you have tested positive?

Let's figure out

🧵

Look at a random group of 1 million people.

Fact 2 says 1 out of 10,000 people has the disease.

In our sample, 100 people have the disease, and 999,900 are healthy.

Mar 31, 2024 • 8 tweets • 3 min read

Weights and Biases are the engines in Neural Networks.

I will explain how they work.

🧵

When data is flowing between different neurons or layers, it is not just going from A to B.

Different transformations happen to them.

These transformations are described with Weights and Biases.

Let's discuss each 🔽

Mar 24, 2024 • 8 tweets • 2 min read

Language models need to know how similar texts or words are.

Here is how they do it: Models usually cannot work with textual data, so we need to convert words into numbers.

This is mostly done with word embeddings. These are vector (numerical) representations of text.

Mar 20, 2024 • 8 tweets • 3 min read

5 Regression Algorithms you should know

🧵

1️⃣ Linear

Linear regression is the most fundamental and widely used regression algorithm.

It assumes a linear relationship between the variables.

The goal is to find the best-fitting line that minimizes the errors between the predicted and actual values.

Mar 17, 2024 • 13 tweets • 4 min read

10 Pandas 1-liners to start Data Analysis: 1.

This code loads a CSV file into a Pandas DataFrame.

This is usually step 1, so we can start working.

Mar 15, 2024 • 9 tweets • 3 min read

Perceptron, the simplest Neural Network.

I explain how it works.

The Perceptron is a binary classifier.

It can decide if data belongs to A or B or make yes or no decisions.

The two classes are usually represented with 0 and 1. I will use this notation in this thread.

Mar 13, 2024 • 10 tweets • 3 min read

The most important part of a histogram:

The number of bins.

Here are a few techniques to optimize it:

1/8

In numpy, we have the option to choose from several techniques.

These will calculate the bin width and consequently the number of bins.

You need to choose the technique and define it in numpy.histogram_bin_edges.

Let's look at them one by one:

2/8

Mar 12, 2024 • 8 tweets • 3 min read

Normal Distribution vs Standard Normal Distribution:

1/8 When you hear the term normal distribution, you probably think about the perfect bell shape.

However, not all Normal distributions are perfectly shaped.

They are usually stretched or squeezed and moved horizontally right or left.

2/8

Mar 5, 2024 • 7 tweets • 3 min read

Correlation and covariance are so similar.

But there is one really important difference.

Let me explain it:

Covariance

The scale of the covariance depends on the scale of the data. It can take any value. No restrictions.

It's hard to compare different datasets with covariance for the reason above.

It's mainly used in Finance and Economics.

Mar 2, 2024 • 8 tweets • 3 min read

Standardization vs Normalization.

What is the difference?

Normalization rescales the values into a range of [0,1].

It is also known as Min-Max scaling.

It is useful if you want different datasets to be on the same positive scale.

It also creates a boundary for values.

Feb 26, 2024 • 9 tweets • 3 min read

Are you familiar with the different distributions used in statistics?

I will share the basics now.

From Bernoulli to Power-law, there are many to explore!

1/9

1️⃣ Bernoulli

It describes the outcomes of binary events, where there are only two possible outcomes.

P represents the probability of observing value one.

For example: Customer will purchase or not?

2/9

Feb 25, 2024 • 9 tweets • 3 min read

How standardization affects your models?

🧵

We want to group people based on their height and weight.

The problem is that the way we measure these metrics is different:

- We use kg for weight and the samples are between 50kgs and 150kgs.

- For height we use meters, ranging from 1.50m to 2.00m.

Feb 23, 2024 • 9 tweets • 3 min read

When you start learning Statistics you may feel there are a million things to memorize.

You're wrong.

What you need is 5-6 basic concepts. Everything else is built around them.

Here are 6 Statistical principles to begin with: 1. Central tendency

Mean, mode, and median are measures that offer quick insights into the 'center' of the data.

It helps to see what a "normal" data point looks like in a group.

We hear about these measures a lot, so it's good to know them well.

Feb 19, 2024 • 6 tweets • 2 min read

We use 2 symbols for the mean. μ (mu) and x̄ (x-bar).

Here is why: μ (mu) represents the population mean.

This is the average of the values of the entire population.

For example, if you want to calculate the population's average income in America, you need answers from all Americans.

Feb 18, 2024 • 10 tweets • 2 min read

Data Leakageis more dangerous than Overfitting.

Why?

There is a big difference between the two.

If you understand this you will realize. 🔽

1/10

Let's define both concepts:

Overfitting

The model learns the training data too well. It also learns the noise and the outliers. These are only present in the training data, so with unseen data the performance will be poor.

2/10

Feb 16, 2024 • 10 tweets • 2 min read

A huge problem in Data Science:

The sample is not accurate.

Use stratified sampling to get more powerful results from your data.

I will explain how to do it:

1/10 Let's consider this example.

The population in Sweden is 53% males and 47% females.

We run a survey with 1000 people.

We have 2 options to select the sample:

2/10

Feb 15, 2024 • 8 tweets • 3 min read

Feb 14, 2024 • 9 tweets • 2 min read

Logistic regression is not used for regression! Let me explain:

1/8 I know it sounds stupid, but it's a classification model.

Why the name then?

Let's see how logistic regression works, so we can understand 👇

2/8

Feb 13, 2024 • 7 tweets • 2 min read

Vectors in Linear Algebra Clearly Explained: A vector is an ordered list of numbers.

They have two main characteristics you must know:

1. Dimensionality

It tells how many numbers are in the vector.

2. Orientation

You can list the numbers in a column or a row. Orientation tells if the numbers are standing or lying.

1/4

Feb 12, 2024 • 9 tweets • 2 min read

5 classification models to start with: 1. Logistic Regression

LR is mainly used for binary classifications, such as 'yes' or 'no' cases.

The output is between 0 and 1, so it can be translated into a probability.

It's effective with simple problems but may struggle with complex ones.

Feb 9, 2024 • 10 tweets • 3 min read

Linear Regression clearly explained: Linear regression is a method also used in ML, to estimate values.

For example, we can estimate:

- Price of a house
- Value of stock
- Life expectancy

Share this page!

Enter URL or ID to Unroll