Since we are dealing with large values (input numbers and the result of their multiplication,) normalizing them avoids the gradients to go out of whack.
That's what the "log" is for: it makes those values smaller and closer to each other.
Here is a full Python 🐍 implementation of a neural network from scratch in less than 20 lines of code!
It shows how it can learn 5 logic functions. (But it's powerful enough to learn much more.)
An excellent exercise in learning how feedforward and backpropagation work!
A quick rundown of the code:
▫️ X → input
▫️ layer → hidden layer
▫️ output → output layer
▫️ W1 → set of weights between X and layer
▫️ W2 → set of weights between layer and output
▫️ error → how far is our prediction after every epoch
I'm using a sigmoid as the activation function. You will recognize it through this formula:
sigmoid(x) = 1 / 1 + exp(-x)
It would have been nicer to extract it as a separate function, but then the code wouldn't be as compact 😉