2/ In linear regression, without the bias term your solution has to go through the origin. That is, when all of your features are zero, your predicted value would also have to be zero.
4/ Hence, Adding a bias weight that does not depend on any of the features allows the hyperplane desbribed by your learned weights to more easily fit data that doesn't pass through the origin.
5/ Think of neural network as a good function approximator. In the case of classification, we are interested in approximating a decision boundary in form of a hyperplane.
7/ If the expected value of your outcome variable is not 0, then we would also need to estimate the bias weight (i.e., the shift from 0) along with the feature weights
3/ If the p-value from the test is less than some significance level (e.g. α = .05), then we can reject the null hypothesis and conclude that the time series is stationary.
2/ It is important to standardize variables before running Cluster Analysis. It is because cluster analysis techniques depend on the concept of measuring the distance between the different observations we're trying to cluster.