(BGD) is optimization algorithm commonly used in ML & optimization problems to minimize the cost function or maximize the objective function
It is type of GD algorithm that update model parameters by taking the average gradient of entire training dataset at each iteration
Here's how the BGD algorithm works:
1) Initialize the model parameters: Start by initializing the model parameters, such as weights and biases, with random values.
2) Compute the cost function: Evaluate the cost function or the objective function on the entire training dataset. The cost function measures the error or the difference between the predicted values of the model and the actual values in the training dataset.
3) Compute the gradient: Calculate the gradient of the cost function with respect to each model parameter. The gradient represents the direction and magnitude of the steepest ascent or descent of the cost function.
4) Update the parameters: Adjust model parameter by subtracting fraction of gradient from current parameter value. The fraction is determined by learning rate which control step size taken in each iteration The update equation for parameter θ is: θ = θ - learning_rate * gradient
5 ) Repeat steps 2-4: Iterate steps 2 to 4 until a stopping criterion is met. This criterion can be a maximum number of iterations, reaching a certain level of convergence, or other conditions specific to the problem.
The key Point of BGD is calculate gradient using entire training dataset at each iteration. This makes it computationally expensive for large dataset but ensure more accurate estimate of gradient. It also guarantees convergence to global minimum (or maximum) of the cost function
Despite its advantages, BGD has some limitation
It requires the entire training dataset to be loaded into memory, which can be challenging for large datasets.
Additionally, BGD may converge slowly if the cost function is non-convex or has many local minima
To overcome the memory and computational limitations of BGD, variations such as Stochastic Gradient Descent (SGD) and Mini-Batch Gradient Descent (MBGD) are often used
SGD updates the parameters using only one random training sample at a time, while MBGD updates the parameters using a small subset (mini-batch) of the training dataset. These variations offer faster convergence and can handle larger datasets more efficiently
Mini-Batch gradient descent can find a balance between the robustness of SGD and the efficiency of BGD.
SGD is an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs. It’s an inexact but powerful technique.
Saddle point or minimax point is point on the surface of graph of function where slopes (derivatives) in orthogonal directions are all zero (a critical point), but which is not local extremum of function
A saddle point (in red) on graph of z = x2 − y2 (hyperbolic paraboloid)
Topic -- Principle Component Analysis
(PCA) Part 1
PCA statistics is science of analyzing all the dimension & reducing them as much as possible while preserving exact information
You can monitor multi-dimensional data (can visualize in 2D or 3D dimension) over any platform using the Principal Component Method of factor analysis.
Step by step explanation of Principal Component Analysis
STANDARDIZATION
COVARIANCE MATRIX COMPUTATION
FEATURE VECTOR
RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES