2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.
4/ Prior to Principal Component Analysis, it is critical to standardize variables. Because PCA gives more weightage to those variables that have higher variances than to those variables that have very low variances #DataScience#MachineLearning#DeepLearning#100DaysOfMLCode
5/In effect the results of the analysis will depend on what units of measurement are used to measure each variable. Standardizing raw values makes equal variance so high weight is not assigned to variables having higher variances.
6/It is required to standardize variable before using k-nearest neighbors with a Euclidean distance measure. Here, Standardization makes all variables to contribute equally.
8/ It is necessary to standardize variables before using Lasso and Ridge Regression. Lasso regression puts constraints on the size of the coefficients associated to each variable.
9/ However, this value will depend on the magnitude of each variable. The result of centering the variables means that there is no longer an intercept. This applies equally to ridge regression as well.
10/In regression analysis, we can calculate importance of variables by ranking independent variables based on the descending order of absolute value of standardized coefficient.
11/In regression analysis when an interaction is created from two variables that are not centered on 0, some amount of collinearity will be induced. Centering first addresses this potential problem
12/In simple terms, having non-standardized variables interact simply means that when X1 is big, then X1*X2 is also going to be bigger on an absolute scale irrespective of X2, and so X1 and X1*X2 will end up correlated #DataScience#MachineLearning#DeepLearning#100DaysOfMLCode
13/ In regression analysis, it is also helpful to standardize a variable when you include power terms X². Standardization removes collinearity.
15/ The simplest solution for binary variables is : not to standardize binary variables but code them as 0/1, and then standardize all other continuous variables by dividing by two standard deviation.
3/ If the p-value from the test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that the time series is stationary.
"roc_auc_score" is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds.