2/ L1 & L2 regularization add constraints to the optimization problem. The curve H0 is the hypothesis. The solution to this system is the set of points where the H0 meets the constraints.
3/ Regularizations in statistics or in the field of machine learning is used to include some extra information in order to solve a problem in a better way.
4/ Now, in the case of L2 regularization, in most cases, the the hypothesis is tangential to the ||w||_2.. The point of intersection has both x1 and x2 components.
5/ On the other hand, in L1, due to the nature of ||w||_1, the viable solutions are limited to the corners, which are on one axis only - in the above case x1. Value of x2 = 0.
6/ This means that the solution has eliminated the role of x2 leading to sparsity. Means, in L1 the likelihood to converge/hit the corners are greater than the likelihood in the L2, so L1 panelizes the coefficients more than L2 which leads to sparsity.
7/
Extend this to higher dimensions and you can see why L1 regularization leads to solutions to the optimization problem where many of the variables have value 0.
In other words, L1 regularization leads to sparsity
3/ If the p-value from the test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that the time series is stationary.
2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.