Santiago Profile picture
4 Feb, 21 tweets, 5 min read
I built a model to predict whether you'll be involved in a crash next time you get in a car.

And it's 99% accurate!

Allow me to show you...πŸ‘‡
Here is the model:

πŸ‘‡
The National Safety Council reports that the odds of being in a car crash in the United States are 1 in 102.

That's a probability of 0.98% of being involved in a crash.

Therefore, my silly model is accurate 99% of the time!

See? I wasn't joking before.

πŸ‘‡
By now, it is probably clear that using "accuracy" as the way to measure the predictive capability of a model is not always a good idea.

The model could be very accurate... and still, give you no useful information at all.

Like right now.

πŸ‘‡
Determining whether you are crashing on a car is an "imbalanced classification problem."

There are two classes: you crash, or you don't. And one of these represents the overwhelming majority of data points.

Takeaway: Accuracy is not a great metric for this type of problem.

πŸ‘‡
Crashing a car is a little bit too morbid, so here are a few more problems that could be framed as imbalanced classification tasks as well:

▫️ Detecting fraudulent transactions
▫️ Classifying spam messages
▫️ Determining whether a patient has cancer

πŸ‘‡
We already saw that we can develop a "highly accurate" model if we classify every credit card transaction as not fraudulent.

An accurate model, but not a useful one.

How do we properly measure the model's effectiveness if accuracy doesn't work for us?

πŸ‘‡
We care about *positive* samples (those transactions that are indeed fraudulent,) and we want to maximize our model's ability to find them.

In statistics, this metric is called "recall."

[Recall β€” Ability of a classification model to identify all relevant samples]

πŸ‘‡
A more formal way to define Recall is through the attached formula.

▫️ True Positives (TP): Fraudulent transactions that our model detected.

▫️ False Negatives (FN): Fraudulent transactions that our model missed.

πŸ‘‡
Imagine that we try again to solve the problem with the attached (useless) function.

We are classifying every instance as negative, so we are going to end up with 0 recall:

▫️ recall = TP / (TP + FN) = 0 / (0 + FN) = 0

πŸ‘‡
That's something!

Now we know that our model is completely useless by using "recall" as our metric.

Since it's 0, we can conclude that the model can't detect any fraudulent transactions.

Ok, we are done!

Or, are we?

πŸ‘‡
How about if we change the model to the attached function?

Now we are returning that every transaction is fraudulent, so we are maximizing True Positives, and our False Negatives will be 0:

▫️ recall = TP / (TP + FN) = TP / TP = 1

Well, that seems good, doesn't it? πŸ™

πŸ‘‡
A recall of 1 is indeed excellent, but again, it just tells part of the story.

Yes, our model now detects every fraudulent transaction, but it also misclassifies every normal transaction!

Our model is not too *precise*.

πŸ‘‡
As you probably guessed, "precision" is the other metric that goes hand in hand with "recall."

[Precision β€” Ability of a classification model to identify only relevant samples]

πŸ‘‡
A more formal way to define Precision is through the attached formula.

▫️ True Positives (TP): Fraudulent transactions that our model detected.

▫️ False Positives (FP): Normal transactions that our model misclassified as fraudulent.

πŸ‘‡
Let's compute the precision of our latest model (the one that classifies every transaction as fraudulent):

▫️ TP = just a few transactions, so a small number
▫️ FP = (1 - a small number) = large number

▫️ precision = TP / (TP + FP) = small / large β‰ˆ 0

πŸ‘‡
The precision calculation wasn't that clean, but hopefully, it is clear that the result will be very close to 0.

So we went from one extreme to the other!

Can you see the relationship?

As we increase the precision of our model, we decrease the recall and vice-versa.

πŸ‘‡
Alright, so now we know a few things about imbalanced classification problems:

▫️ Accuracy is not that useful.
▫️ We want a high recall.
▫️ We want high precision.
▫️ There's a tradeoff between precision and recall.

There's one more thing that I wanted to mention.

πŸ‘‡
There may be cases where we want to find a good balance between precision and recall.

For this, we can use a metric called "F1 Score," defined with the attached formula.

[F1 Score β€” Harmonic mean of precision and recall]

πŸ‘‡
The F1 Score gives equal weight to both precision and recall and punishes extreme values.

This means that either one of the dummy functions we discussed before will show a very low F1 Score!

My models suck, and they won't fool the F1 Score.

πŸ‘‡
So that's it for this story.

If you want to keep reading about metrics, here is an excellent, more comprehensive thread about different metrics used in machine learning (and the inspiration for this thread):

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

2 Feb
For the past few months, I've been trying to improve the quality of the content I publish.

There are a couple of ways I'm measuring this:

▫️ Efficiency
▫️ Engagement

Efficiency is about how many impressions and followers I get for every tweet I post.

πŸ‘‡
I've gone from posting 3,126 tweets back in August down to 949 tweets last month.

I've cut a lot of the noise!

During the same period, I've doubled my impressions (up to 14.4M last month,) and I'm now converting 5.38 followers for every tweet (up from 2.52.)

πŸ‘‡
The second way I'm watching the quality of the content I'm posting is through the engagement rate.

This has gone down quite a bit since August (almost cut in half!)

As impressions increase, the more pressure I have to put engaging content out there.

πŸ‘‡
Read 7 tweets
2 Feb
Here is a full Python 🐍 implementation of a neural network from scratch in less than 20 lines of code!

It shows how it can learn 5 logic functions. (But it's powerful enough to learn much more.)

An excellent exercise in learning how feedforward and backpropagation work!
A quick rundown of the code:

▫️ X β†’ input
▫️ layer β†’ hidden layer
▫️ output β†’ output layer
▫️ W1 β†’ set of weights between X and layer
▫️ W2 β†’ set of weights between layer and output
▫️ error β†’ how far is our prediction after every epoch
I'm using a sigmoid as the activation function. You will recognize it through this formula:

sigmoid(x) = 1 / 1 + exp(-x)

It would have been nicer to extract it as a separate function, but then the code wouldn't be as compact πŸ˜‰
Read 7 tweets
1 Feb
Time spent developing better datasets is usually more productive than squeezing the algorithms that process them.
One thing to keep in mind is that "better datasets" is not equivalent to "more data."

Regardless of your ability to collect the data, properly pre-processing it will usually give you a very good bang for your buck.

Hopefully, credit is given for the ultimate predictive ability of the solution.

A machine learning system is not just a model. There are a lot of pieces that need to work together.

Read 5 tweets
1 Feb
Here is a simple example of a machine learning model.

I put it together a long time ago, and it was very helpful! I sliced it apart a thousand times until things started to make sense.

It's TensorFlow and Keras.

If you are starting out, this may be a good puzzle to solve.
The goal of this model is to learn to multiply one-digit numbers.

The dataset has two values (the ones we want to multiply.) That's why the input shape is 2D.

The input shape represents the input layer of our model. It connects to the first hidden layer: a 4-unit Dense layer.

Then you get another 4-unit Dense layer.

Read 4 tweets
31 Jan
What's your favorite machine learning book?
This edition of the book does indeed uses TensorFlow 2.0 (notice the top right corner of the picture.)

Read 4 tweets
31 Jan
"Hands-On Machine Learning..." is β€”without a doubtβ€” my favorite machine learning book.

It's not only a great reference, but it's the type of book that you can easily read cover to cover!

If you want to start from a solid foundation, look no further.

πŸ’° amzn.to/2KPuRAo
The book assumes that you have some experience with:

▫️ Python programming
▫️ NumPy
▫️ Pandas
▫️ Matplotlib

For a deeper dive, it expects you to have a reasonable understanding of calculus, linear algebra, probabilities, and statistics.
The book is organized into two parts:

▫️ The Fundamentals of Machine Learning
▫️ Neural Networks and Deep Learning

Here is the outline of what's covered: πŸ‘‡
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!