Here's the lowdown: the output of a neural network is calculated with the ππ½ weights ππ½ of the edges that connect the nodes in the network.
So, you gotta find the optimal values of weights to minimize the final error on training data examples.
1οΈβ£ We start by assigning random values to all weights in the network
2οΈβ£ Then, for every input sample, we perform a feedforward operation to calculate the final output and the prediction error.
3οΈβ£ This predicted error is propagated backwards through the network, where it's used to adjust the weights.
4οΈβ£ We repeat this process until the error falls below the desired threshold.
ππ This is where back propagation comes in ππ
A simple four step process is at the heart of it all 1) Error calculation 2) Weight adjustment 3) Propagation 4) Repeat
Let's go into more detail:
1) Error calculation
The error of the network's output is calculated using a loss function.
This error is then used to update the network's weights in the next step of backpropagation.
2) Weight adjustment
This is done using an optimization algorithm (ie gradient descent) which adjusts the weights to minimize the error and improve the network's performance.
The adjusted weights are then used to propagate the error through the network in the next step.
3) Propagation
This is done by applying the chain rule to calculate the gradient of the error with respect to each weight in the network.
The calculated gradients are then used to adjust the weights in the next step of back propagation.
4) Repeat
Back prop is repeated multiple times to improve the network's performance.
This involves calculating the output, error, and gradients, and adjusting the weights.
By repeating this process, the network is able to learn from the data and improve its performance.
If you found this thread helpful, go back up and retweet it and don't forget to follow me @DataScienceHarp for more deep learning fundamentals in your feed.
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
Skip connections are a common feature in modern CNN architectures. They create an alternative path for the gradient to flow through, which can help the model learn faster.
In a neural network, the gradient measures how much a change in one part of the network affects the output. We use the gradient to update the network during training to recognize data patterns better.