Levi Profile picture
I explain Data Science on Grandma's level. Writing https://t.co/25jLCDRZms
13 subscribers
Apr 20 7 tweets 3 min read
A surprising statistical result 🔽

You have tested positive for a disease.

- The test is 99% accurate.

- 1 out of 10,000 people has the disease.

What is the probability that you truly have the disease, given that you have tested positive?

Let's figure out

🧵 Image Look at a random group of 1 million people.

Fact 2 says 1 out of 10,000 people has the disease.

In our sample, 100 people have the disease, and 999,900 are healthy. Image
Mar 31 8 tweets 3 min read
Weights and Biases are the engines in Neural Networks.

I will explain how they work.

🧵 Image When data is flowing between different neurons or layers, it is not just going from A to B.

Different transformations happen to them.

These transformations are described with Weights and Biases.

Let's discuss each 🔽
Mar 24 8 tweets 2 min read
Language models need to know how similar texts or words are.

Here is how they do it: Models usually cannot work with textual data, so we need to convert words into numbers.

This is mostly done with word embeddings. These are vector (numerical) representations of text.
Mar 20 8 tweets 3 min read
5 Regression Algorithms you should know

🧵 Image 1️⃣ Linear

Linear regression is the most fundamental and widely used regression algorithm.

It assumes a linear relationship between the variables.

The goal is to find the best-fitting line that minimizes the errors between the predicted and actual values. Image
Mar 17 13 tweets 4 min read
10 Pandas 1-liners to start Data Analysis: 1.

This code loads a CSV file into a Pandas DataFrame.

This is usually step 1, so we can start working. Image
Mar 15 9 tweets 3 min read
Perceptron, the simplest Neural Network.

I explain how it works. Image The Perceptron is a binary classifier.

It can decide if data belongs to A or B or make yes or no decisions.

The two classes are usually represented with 0 and 1. I will use this notation in this thread.
Mar 13 10 tweets 3 min read
The most important part of a histogram:

The number of bins.

Here are a few techniques to optimize it:

1/8 Image In numpy, we have the option to choose from several techniques.

These will calculate the bin width and consequently the number of bins.

You need to choose the technique and define it in numpy.histogram_bin_edges.

Let's look at them one by one:

2/8
Mar 12 8 tweets 3 min read
Normal Distribution vs Standard Normal Distribution:

1/8
When you hear the term normal distribution, you probably think about the perfect bell shape.

However, not all Normal distributions are perfectly shaped.

They are usually stretched or squeezed and moved horizontally right or left.

2/8 Image
Mar 5 7 tweets 3 min read
Correlation and covariance are so similar.

But there is one really important difference.

Let me explain it: Image Covariance

The scale of the covariance depends on the scale of the data. It can take any value. No restrictions.

It's hard to compare different datasets with covariance for the reason above.

It's mainly used in Finance and Economics. Image
Mar 2 8 tweets 3 min read
Standardization vs Normalization.

What is the difference? Image Normalization rescales the values into a range of [0,1].

It is also known as Min-Max scaling.

It is useful if you want different datasets to be on the same positive scale.

It also creates a boundary for values. Image
Feb 26 9 tweets 3 min read
Are you familiar with the different distributions used in statistics?

I will share the basics now.

From Bernoulli to Power-law, there are many to explore!

1/9 Image 1️⃣ Bernoulli

It describes the outcomes of binary events, where there are only two possible outcomes.

P represents the probability of observing value one.

For example: Customer will purchase or not?

2/9 Image
Feb 25 9 tweets 3 min read
How standardization affects your models?

🧵 Image We want to group people based on their height and weight.

The problem is that the way we measure these metrics is different:

- We use kg for weight and the samples are between 50kgs and 150kgs.

- For height we use meters, ranging from 1.50m to 2.00m.
Feb 23 9 tweets 3 min read
When you start learning Statistics you may feel there are a million things to memorize.

You're wrong.

What you need is 5-6 basic concepts. Everything else is built around them.

Here are 6 Statistical principles to begin with: 1. Central tendency

Mean, mode, and median are measures that offer quick insights into the 'center' of the data.

It helps to see what a "normal" data point looks like in a group.

We hear about these measures a lot, so it's good to know them well. Image
Feb 19 6 tweets 2 min read
We use 2 symbols for the mean. μ (mu) and x̄ (x-bar).

Here is why: μ (mu) represents the population mean.

This is the average of the values of the entire population.

For example, if you want to calculate the population's average income in America, you need answers from all Americans. Image
Feb 18 10 tweets 2 min read
Data Leakageis more dangerous than Overfitting.

Why?

There is a big difference between the two.

If you understand this you will realize. 🔽

1/10 Image Let's define both concepts:

Overfitting

The model learns the training data too well. It also learns the noise and the outliers. These are only present in the training data, so with unseen data the performance will be poor.

2/10
Feb 16 10 tweets 2 min read
A huge problem in Data Science:

The sample is not accurate.

Use stratified sampling to get more powerful results from your data.

I will explain how to do it:

1/10 Let's consider this example.

The population in Sweden is 53% males and 47% females.

We run a survey with 1000 people.

We have 2 options to select the sample:

2/10
Feb 15 8 tweets 3 min read
Normal Distribution vs Standard Normal Distribution:

1/8
When you hear the term normal distribution, you probably think about the perfect bell shape.

However, not all Normal distributions are perfectly shaped.

They are usually stretched or squeezed and moved horizontally right or left.

2/8 Image
Feb 14 9 tweets 2 min read
Logistic regression is not used for regression! Let me explain:

1/8
I know it sounds stupid, but it's a classification model.

Why the name then?

Let's see how logistic regression works, so we can understand 👇

2/8
Feb 13 7 tweets 2 min read
Vectors in Linear Algebra Clearly Explained: A vector is an ordered list of numbers.

They have two main characteristics you must know:

1. Dimensionality

It tells how many numbers are in the vector.

2. Orientation

You can list the numbers in a column or a row. Orientation tells if the numbers are standing or lying.

1/4 Image
Feb 12 9 tweets 2 min read
5 classification models to start with: 1. Logistic Regression

LR is mainly used for binary classifications, such as 'yes' or 'no' cases.

The output is between 0 and 1, so it can be translated into a probability.

It's effective with simple problems but may struggle with complex ones.
Feb 9 10 tweets 3 min read
Linear Regression clearly explained: Linear regression is a method also used in ML, to estimate values.

For example, we can estimate:

- Price of a house
- Value of stock
- Life expectancy Image