Machine Learning Formulas Explained! πŸ‘¨β€πŸ«

This is the formula for the Binary Cross Entropy Loss. This loss function is commonly used for binary classification problems.

It may look super confusing, but I promise you that it is actually quite simple!

Let's go step by step πŸ‘‡
The Cross-Entropy Loss function is one of the most used losses for classification problems. It tells us how well a machine learning model classifies a dataset compared to the ground truth labels.

The Binary Cross-Entropy Loss is a special case when we have only 2 classes.

πŸ‘‡
The most important part to understand is this one - this is the core of the whole formula!

Here, Y denotes the ground-truth label, while ΕΆ is the predicted probability of the classifier.

Let's look at a simple example before we talk about the logarithm... πŸ‘‡
Imagine we have a bunch of photos and we want to classify each one as being a photo of a bird or not.

All photos are manually so that Y=1 for all bird photos and Y=0 for the rest.

The classifier (say a NN) outputs a probability of the photo containing a bird, like ΕΆ=0.9

πŸ‘‡
Now, let's look a the logarithm.

Since ΕΆ is a number between 0 and 1, log ΕΆ will be a negative number increasing up to 0.

Let's take an example of a bird photo (Y=1):
β–ͺ️ Classifier predicts 99% bird, so we get -0.01
β–ͺ️ Classifier predicts 5% bird, so we get -3

That's weird πŸ‘‡
For a loss, we want a value close to 0 if the classifier is right and a large value when the classifier is wrong. In the example above it was the opposite!

Fortunately, this is easy to fix - we just need to multiply the value by -1 and can interpret the value as an error πŸ€·β€β™‚οΈ

πŸ‘‡
If the photo is labeled as no being a bird, then we have Y=0 and so the whole term becomes 0.

That's why we have the second part - the negative case. Here we just take 1-Y and 1-ΕΆ for the probabilities. We are interested in the probability of the photo not being a bird.

πŸ‘‡
Combining both we get the error for one data sample (one photo). Note that one of the terms will always be 0, depending on how the photo is labeled.

This is actually the case if we have more than 2 classes as well when using one-hot encoding!

OK, almost done with that part πŸ‘‡
Now, you should have a feeling of how the core of the formula works, but why do we use a logarithm?

I won't go into detail, but let's just say this is a common way to formulate optimization problems in math - the logarithm makes all multiplications to sums.

Now the rest πŸ‘‡
We know how to compute the loss for one sample, so now we just take the mean over all samples in our dataset (or minibatch) to compute the loss.

Remember - we need to multiply everything by -1 so that we can invert the value and interpret it as a loss (low good, high bad).

πŸ‘‡
Where to find it in your ML framework?

The Cross-Entropy Loss is sometimes also called Log Loss or Negative Log Loss.

β–ͺ️ PyTorch - torch.nn.NLLLoss
β–ͺ️ TensorFlow - tf.keras.losses.BinaryCrossentropy and CategoricalCrossentropy
β–ͺ️ Scikit-learn - sklearn.metrics.log_loss
And if it is easier for you to read code than formulas, here is a simple implementation and two examples of a good (low loss) and a bad classifier (high loss).
I regularly post threads like this on topics like machine learning and self-driving cars.

Follow me @haltakov for more!
Yes, this is a good point! The input to the loss needs to be probabilities!

For classifiers that don't necessarily output probabilities (for example a NN with ReLU), you usually add a softmax layer.

Or use torch.nn.CrossEntropyLoss in PyTorch.

This is actually interesting! There are two very different ways to arrive at the same formula. One is what you mention using the log loss, but the other comes from information theory. They happen to be the same in this context.

machinelearningmastery.com/cross-entropy-…

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Vladimir Haltakov

Vladimir Haltakov Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @haltakov

21 Sep
There are two problems with ROC curves

❌ They don't work for imbalanced datasets
❌ They don't work for object detection problems

So what do we do to evaluate our machine learning models properly in these cases?

We use a Precision-Recall curve.

Another one of my threads πŸ‘‡
Last week I wrote another detailed thread on ROC curves. I recommend that you read it first if you don't know what they are.



Then go on πŸ‘‡
❌ Problem 1 - Imbalanced Data

ROC curves measure the True Positive Rate (also known as Accuracy). So, if you have an imbalanced dataset, the ROC curve will not tell you if your classifier completely ignores the underrepresented class.

More details:

πŸ‘‡
Read 19 tweets
20 Sep
How to spot fake images of faces generated by a GAN? Look at the eyes! πŸ‘οΈ

This is an interesting paper that shows how fake images of faces can be easily detected by looking at the shape of the pupil.

The pupils in GAN-generated images are usually not round - see the image!

πŸ‘‡
Here is the actual paper. The authors propose a way to automatically identify fake images by analyzing the pupil's shape.

arxiv.org/abs/2109.00162
The bad thing is, GANs will probably quickly catch up and include an additional constraint for pupils to be round...
Read 5 tweets
15 Sep
Did you ever want to learn how to read ROC curves? πŸ“ˆπŸ€”

This is something you will encounter a lot when analyzing the performance of machine learning models.

Let me help you understand them πŸ‘‡
What does ROC mean?

ROC stands for Receiver Operating Characteristic but just forget about it. This is a military term from the 1940s and doesn't make much sense today.

Think about these curves as True Positive Rate vs. False Positive Rate plots.

Now, let's dive in πŸ‘‡
The ROC curve visualizes the trade-offs that a binary classifier makes between True Positives and False Positives.

This may sound too abstract for you so let's look at an example. After that, I encourage you to come back and read the previous sentence again!

Now the example πŸ‘‡
Read 21 tweets
14 Sep
Most people seem to use matplotlib as a Python plotting library, but is it really the best choice? πŸ€”

We are going to compare 5 free and popular libraries:
β–ͺ️ Matplotlib
β–ͺ️ Seaborn
β–ͺ️ Plotly
β–ͺ️ Bokeh
β–ͺ️ Altair

Which one is the best? Find out below πŸ‘‡
In a survey I did the other day, matplotlib had the most users by a large margin. This was quite surprising to me since I don't really like it...



But let's first look at each library πŸ‘‡
Matplotlib πŸ“ˆ

Matplotlib is one of the most popular libraries out there.

βœ… Supports many types of plots
βœ… Lots of customization options

❌ Plots look ugly
❌ Limited interactivity
❌ Not very intuitive to use
Read 11 tweets
9 Sep
I highly recommend listening to the latest eposide of @therobotbrains podcast with @ilyasut.

therobotbrains.ai/podcasts/episo…

Here are some insights I found particulalry interesting πŸ‘‡
"Neural networks are parallel computers"

That is why they are so powerful - you can train a generic computer to solve your problem. This is also the driver behind Software 2.0 - neural network are becoming more and more capable of solving all kinds of problems.
"Neural networks perform well on tasks that humans can perform very quickly"

Humans don't think much when listening, observing or performing simple tasks.

This means that a neural network can be trained to be good at it as well: NLP, computer vision and reinforcement learning.
Read 4 tweets
9 Sep
My setup for recording videos for my machine learning course πŸŽ₯

A lot of people asked about my setup the other day, so here a short thread on that. It's nothing fancy, but it does a good job πŸ€·β€β™‚οΈ

Details πŸ‘‡
Hardware βš™οΈ

β–ͺ️ MacBook Pro (2015 model) - screen sharing and recording
β–ͺ️ iPhone XS - using the back camera for video recording
β–ͺ️ Omnidiretional external mic - connected to the iPhone
β–ͺ️ Highly professional camera rig - books mostly about cooking and travel πŸ˜„

πŸ‘‡
Software πŸ’»

β–ͺ️ OBS Studio - recording of the screen and the camera image
β–ͺ️ EpocCam - use your iPhone as a web cam. You can connect your iPhone both over WiFi and cable.
β–ͺ️ Google Slides - for presentation
β–ͺ️ Jupyter notebooks and Google Colab - for experimenting with code

πŸ‘‡
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(