Tweet

Rohan Paul

May 27 • 16 tweets • 25 min read

1/ When it is important to standardize variables in #DataScience #MachineLearning ? - A Thread

#DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics #programming #ArtificialIntelligence #Data #Stats #Database #BigData #100DaysOfCode

2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.

#DataScience #MachineLearning #DeepLearning

3/ If a variable is measured at a higher scale than the other variables, then whatever measure we use will be overly influenced by that variable.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics

4/ Prior to Principal Component Analysis, it is critical to standardize variables. Because PCA gives more weightage to those variables that have higher variances than to those variables that have very low variances
#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode

5/In effect the results of the analysis will depend on what units of measurement are used to measure each variable. Standardizing raw values makes equal variance so high weight is not assigned to variables having higher variances.

#DataScience #MachineLearning #DeepLearning

6/It is required to standardize variable before using k-nearest neighbors with a Euclidean distance measure. Here, Standardization makes all variables to contribute equally.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

7/ All SVM kernel methods are based on distance so it is required to scale variables prior to running SVM model.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics #programming #ArtificialIntelligence

8/ It is necessary to standardize variables before using Lasso and Ridge Regression. Lasso regression puts constraints on the size of the coefficients associated to each variable.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

9/ However, this value will depend on the magnitude of each variable. The result of centering the variables means that there is no longer an intercept. This applies equally to ridge regression as well.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python

10/In regression analysis, we can calculate importance of variables by ranking independent variables based on the descending order of absolute value of standardized coefficient.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

11/In regression analysis when an interaction is created from two variables that are not centered on 0, some amount of collinearity will be induced. Centering first addresses this potential problem

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode

12/In simple terms, having non-standardized variables interact simply means that when X1 is big, then X1*X2 is also going to be bigger on an absolute scale irrespective of X2, and so X1 and X1*X2 will end up correlated
#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode

13/ In regression analysis, it is also helpful to standardize a variable when you include power terms X². Standardization removes collinearity.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics

14/ When it is not required to standardize variables

Standardizing binary variables makes interpretation of binary variables vague as it cannot be increased by a standard deviation.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI

15/ The simplest solution for binary variables is : not to standardize binary variables but code them as 0/1, and then standardize all other continuous variables by dividing by two standard deviation.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python

@rohanpaul_ai

16/ For regular tips and techniques on #DeepLearning, #ComputerVision and #MachineLearning follow me @rohanpaul_ai

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @rohanpaul_ai

Rohan Paul

@rohanpaul_ai

May 28

1/ "Software is eating the world. Machine learning is eating software. Transformers are eating machine learning."

Let's understand what these Transformers are all about

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataAnalytics

2/ #Transformers architecture follows Encoder and Decoder structure.

The encoder receives input sequence and creates intermediate representation by applying embedding and attention mechanism.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI

3/ Then, this intermediate representation or hidden state will pass through the decoder, and the decoder starts generating an output sequence.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics

Read 14 tweets

Rohan Paul

@rohanpaul_ai

May 28

But what p-value means in #MachineLearning - A thread

It tells you how likely it is that your data could have occurred under the null hypothesis

1/n

#DataScience #DeepLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat

2/n
What Is a Null Hypothesis?

A null hypothesis is a type of statistical hypothesis that proposes that no statistical significance exists in a set of given observations.

#DataScience #MachineLearning #100DaysOfMLCode #Python #stat #Statistics #Data #AI #Math #deeplearning

3/n
A P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis

#DataScience #MachineLearning #100DaysOfMLCode #Python #DataScientist #Statistics #Data #DataAnalytics #AI #Math

Read 11 tweets

Rohan Paul

@rohanpaul_ai

May 28

1/ One way to test whether a time series is stationary is to perform an augmented Dickey-Fuller test - A Thread

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist #DataAnalytics #Statistics #programming #ArtificialIntelligence

2/ H0: The time series is non-stationary. In other words, it has some time-dependent structure and does not have constant variance over time.

HA: The time series is stationary.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

3/ If the p-value from the test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that the time series is stationary.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #pythoncode #AI #DataScientist

Read 8 tweets

Rohan Paul

@rohanpaul_ai

May 28

Kullback-Leibler (KL) Divergence - A Thread

It is a measure of how one probability distribution diverges from another expected probability distribution.

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data #DataAnalytics #pythoncode #AI #MachineLearning #NeuralNetworks

#DataScience #Statistics #DeepLearning #ComputerVision #100DaysOfMLCode #Python #programming #ArtificialIntelligence #Data #DataAnalytics #pythoncode #AI #MachineLearning #NeuralNetworks

Read 6 tweets

Rohan Paul

@rohanpaul_ai

May 27

Did you know how TensorFlow can run on a single mobile device as well as on an entire data center? Read this thread

1/n

#TensorFlow #DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data

2/n
Google has designed TensorFlow such that it is capable of dividing a large model graph whenever needed.

#TensorFlow #DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #AI

3/n
It assigns special SEND and RECV nodes whenever a graph is divided between multiple devices (CPUs or GPUs).

#TensorFlow #DataScience #DeepLearning #MachineLearning #ComputerVision #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Data #Math #Stat #AI

Read 9 tweets

Rohan Paul

@rohanpaul_ai

May 27

A thread on AUC Score (Area under the ROC Curve) Interpretation in #DataScience #MachineLearning

1/16

#DeepLearning #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Math #Data #DataAnalytics #pythoncode #AI #ArtificialIntelligence #TensorFlow #PyTorch #Pandas

2/16

"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python

3/16

AUC ranges in value from 0 to 1.

#DataScience #MachineLearning #DeepLearning #100DaysOfMLCode #Python #DataScientist #Statistics #programming #Math #Data #DataAnalytics #pythoncode #AI #ArtificialIntelligence #TensorFlow #PyTorch #Pandas #Stat #dataviz #learning

Read 16 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Rohan Paul

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @rohanpaul_ai

Rohan Paul

Rohan Paul

Rohan Paul

Rohan Paul

Rohan Paul

Rohan Paul

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?