Santiago Profile picture
Jan 29 15 tweets 3 min read
When I started with machine learning, I always made the same mistake:

I confused a couple of metrics that look very similar but are entirely different.

Let's fix that for you.

2. When we train a machine learning model, we need to compute how different our predictions are from the expected results.

For example, if we predict a house's price as $150,000, but the correct answer is $200,000, our "error" is $50,000.
3. There are multiple ways we can compute this error, but two common choices are:

• RMSE — Root Mean Squared Error
• MAE — Mean Absolute Error

These have different properties that will shine depending on the problem you want to solve.
4. Remember:

The optimizer will use this error to adjust the model. We want to set up the right incentives, so the model learns.
5. Let's focus on a critical difference between these two metrics:

Remember the "squared" portion of the RMSE.

It means that you are "squaring" the difference between the prediction and the expected value.

Why is this relevant?
6. Squaring the difference "penalizes" larger values.

If you expect a prediction to be 2, but you get 10, using RMSE, the error will be (2 - 10)² = 64.

However, if you get 5, the error will be (2 - 5)² = 9.

Do you see how it penalizes larger errors?
7. MAE doesn't have the same property.

The error increases proportionally with the difference between predictions and target values.

Understanding this is important to decide which metric is better for each case.
8. Predicting a house's price is a good example where $10,000 off is twice as bad as $5,000.

We don't necessarily need to rely on RMSE here, and MAE may be all we need.
9. But predicting the pressure of a tank may work differently.

While 5 psi off may be within the expected range, 10 psi off may be a complete disaster.

Here 10 is much worse than just two times 5, so RMSE may be a better approach.
10. Although there's more to RMSE and MAE, I have always found this metal model helpful to understand how they work.

Three days ago, I asked the attached question.

We should be ready now to answer it.
11. Looking at Option 1, we already know it is a correct answer.

RMSE penalizes larger differences between predictions and expected results.
12. Looking at both formulas, RMSE has extra squaring and root squaring operations, so it can't be faster to compute than MAE.

Option 2 is, therefore, not correct.
13. Option 3 states that RSME is indifferent to the direction of the error, but MAE isn't.

This is not correct: MAE uses the absolute value of the error, so both negative and positive values will end up being the same.
14. Option 4 states that MAE is indifferent to the direction of the error, but RMSE isn't.

This is not correct either.

RMSE squares the error, so both negative and positive values will end up being the same.
15. In summary, the only correct answer to this question is Option 1.

By the way, I write practical tips, break down complex concepts, and regularly publish short quizzes to keep you on your toes.

Follow me @svpino, and let's do this together!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Jan 28
Can you guess what their biggest struggle is?

I regularly talk to companies using machine learning, from Fortune-500 to the ice cream parlor in the block around the corner.

Surprisingly, building models is not an issue for them.

Wanna guess? ↓
"Don't worry about the model" is what I usually get.

The real struggle? → "What do we do with this Jupyter notebook running the model that we built"?

It's not about building models. It's about making them useful.
Many feel like having a model is the end of the road.

In reality, it is just the beginning.

The fundamental hurdle is understanding what to do with them.
Read 8 tweets
Jan 24
I built a machine learning model that predicts whether your car will crash today.

And it's 99% accurate!

Here is the secret: ↓
This thread is the answer to this question.
Before getting into the details, let's jump right into the source code of my model:
Read 10 tweets
Jan 21
Occam's Razor:

Given two solutions with similar characteristics, the simplest and most direct one is the correct answer.

This thread answers the following question:
Option 3 is probably the simplest one to tackle first.

It talks about "the speed of the training process" and relates it to overtraining and overcomplicating results.

A quick training process doesn't necessarily reduce complexity. This option is not correct.
Read 7 tweets
Jan 21
Three deep learning myths:

1. A lot of math is needed
2. A lot of data is needed
3. An expensive computer is needed

If these are preventing you from starting, reconsider.

(Hat tip to the FastAI Course.)
Data Structures and Algorithms are an underrated set of skills for any software professional.

They are definitely very important!

That being said, I don't think they are absolute requirements for deep learning work.

Understanding the math underpinnings of anything you do will definitely open doors for you.

However, stating that you can't do deep learning unless you understand all of the math involved is not a serious statement.

Read 5 tweets
Jan 16
Using more features from your data never comes for free.

Let's talk about dimensionality.

2. Two days ago I asked this question.

Let's now analyze each option starting with Option 3 (probably the easiest one we can discard.)
3. Option 3 states that when we cut down the number of features, we need to "make up the difference" by adding more data.

Removing features reduces the number of dimensions in our data.

It concentrates the samples we have in a lower-dimensional space.
Read 12 tweets
Jan 14
The complexity of turning a Jupyter notebook into a production system is frequently underestimated.

Having a model that performs great on a test set is not the end of the road but just the beginning.

Fortunately, there's something for you here!

2. The productionization of machine learning systems is one of the most critical topics in the industry today.

There's been a lot of progress, and it's getting better, but for the most part, we are just at the beginning of this road.
3. Not only the space is still immature, but it's very fragmented.

Talk to three different teams, and it's very likely they all use different tools, processes, and focus on different aspects of the lifecycle of their systems.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(