This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems.
It may look super confusing, but I promise you that it is actually quite simple!
Let's go step by step π
The Cross-Entropy Loss function is one of the most used losses for classification problems. It tells us how well a machine learning model classifies a dataset compared to the ground truth labels.
The Binary Cross-Entropy Loss is a special case when we have only 2 classes.
π
The most important part to understand is this one - this is the core of the whole formula!
Here, Y denotes the ground-truth label, while ΕΆ is the predicted probability of the classifier.
Let's look at a simple example before we talk about the logarithm... π
Imagine we have a bunch of photos and we want to classify each one as being a photo of a bird or not.
All photos are manually so that Y=1 for all bird photos and Y=0 for the rest.
The classifier (say a NN) outputs a probability of the photo containing a bird, like ΕΆ=0.9
π
Now, let's look a the logarithm.
Since ΕΆ is a number between 0 and 1, log ΕΆ will be a negative number increasing up to 0.
Let's take an example of a bird photo (Y=1):
βͺοΈ Classifier predicts 99% bird, so we get -0.01
βͺοΈ Classifier predicts 5% bird, so we get -3
That's weird π
For a loss, we want a value close to 0 if the classifier is right and a large value when the classifier is wrong. In the example above it was the opposite!
Fortunately, this is easy to fix - we just need to multiply the value by -1 and can interpret the value as an error π€·ββοΈ
π
If the photo is labeled as no being a bird, then we have Y=0 and so the whole term becomes 0.
That's why we have the second part - the negative case. Here we just take 1-Y and 1-ΕΆ for the probabilities. We are interested in the probability of the photo not being a bird.
π
Combining both we get the error for one data sample (one photo). Note that one of the terms will always be 0, depending on how the photo is labeled.
This is actually the case if we have more than 2 classes as well when using one-hot encoding!
OK, almost done with that part π
Now, you should have a feeling of how the core of the formula works, but why do we use a logarithm?
I won't go into detail, but let's just say this is a common way to formulate optimization problems in math - the logarithm makes all multiplications to sums.
Now the rest π
We know how to compute the loss for one sample, so now we just take the mean over all samples in our dataset (or minibatch) to compute the loss.
Remember - we need to multiply everything by -1 so that we can invert the value and interpret it as a loss (low good, high bad).
π
Where to find it in your ML framework?
The Cross-Entropy Loss is sometimes also called Log Loss or Negative Log Loss.
And if it is easier for you to read code than formulas, here is a simple implementation and two examples of a good (low loss) and a bad classifier (high loss).
I regularly post threads like this on topics like machine learning and self-driving cars.
This is actually interesting! There are two very different ways to arrive at the same formula. One is what you mention using the log loss, but the other comes from information theory. They happen to be the same in this context.
ROC curves measure the True Positive Rate (also known as Accuracy). So, if you have an imbalanced dataset, the ROC curve will not tell you if your classifier completely ignores the underrepresented class.
Here are some insights I found particulalry interesting π
"Neural networks are parallel computers"
That is why they are so powerful - you can train a generic computer to solve your problem. This is also the driver behind Software 2.0 - neural network are becoming more and more capable of solving all kinds of problems.
"Neural networks perform well on tasks that humans can perform very quickly"
Humans don't think much when listening, observing or performing simple tasks.
This means that a neural network can be trained to be good at it as well: NLP, computer vision and reinforcement learning.
My setup for recording videos for my machine learning course π₯
A lot of people asked about my setup the other day, so here a short thread on that. It's nothing fancy, but it does a good job π€·ββοΈ
Details π
Hardware βοΈ
βͺοΈ MacBook Pro (2015 model) - screen sharing and recording
βͺοΈ iPhone XS - using the back camera for video recording
βͺοΈ Omnidiretional external mic - connected to the iPhone
βͺοΈ Highly professional camera rig - books mostly about cooking and travel π
π
Software π»
βͺοΈ OBS Studio - recording of the screen and the camera image
βͺοΈ EpocCam - use your iPhone as a web cam. You can connect your iPhone both over WiFi and cable.
βͺοΈ Google Slides - for presentation
βͺοΈ Jupyter notebooks and Google Colab - for experimenting with code