Tweet

Santiago

Follow @svpino

Jan 29 • 15 tweets • 3 min read

When I started with machine learning, I always made the same mistake:

I confused a couple of metrics that look very similar but are entirely different.

Let's fix that for you.

↓

2. When we train a machine learning model, we need to compute how different our predictions are from the expected results.

For example, if we predict a house's price as $150,000, but the correct answer is $200,000, our "error" is $50,000.

3. There are multiple ways we can compute this error, but two common choices are:

• RMSE — Root Mean Squared Error
• MAE — Mean Absolute Error

These have different properties that will shine depending on the problem you want to solve.

4. Remember:

The optimizer will use this error to adjust the model. We want to set up the right incentives, so the model learns.

5. Let's focus on a critical difference between these two metrics:

Remember the "squared" portion of the RMSE.

It means that you are "squaring" the difference between the prediction and the expected value.

Why is this relevant?

6. Squaring the difference "penalizes" larger values.

If you expect a prediction to be 2, but you get 10, using RMSE, the error will be (2 - 10)² = 64.

However, if you get 5, the error will be (2 - 5)² = 9.

Do you see how it penalizes larger errors?

7. MAE doesn't have the same property.

The error increases proportionally with the difference between predictions and target values.

Understanding this is important to decide which metric is better for each case.

8. Predicting a house's price is a good example where $10,000 off is twice as bad as $5,000.

We don't necessarily need to rely on RMSE here, and MAE may be all we need.

9. But predicting the pressure of a tank may work differently.

While 5 psi off may be within the expected range, 10 psi off may be a complete disaster.

Here 10 is much worse than just two times 5, so RMSE may be a better approach.

10. Although there's more to RMSE and MAE, I have always found this metal model helpful to understand how they work.

Three days ago, I asked the attached question.

We should be ready now to answer it.

11. Looking at Option 1, we already know it is a correct answer.

RMSE penalizes larger differences between predictions and expected results.

12. Looking at both formulas, RMSE has extra squaring and root squaring operations, so it can't be faster to compute than MAE.

Option 2 is, therefore, not correct.

13. Option 3 states that RSME is indifferent to the direction of the error, but MAE isn't.

This is not correct: MAE uses the absolute value of the error, so both negative and positive values will end up being the same.

14. Option 4 states that MAE is indifferent to the direction of the error, but RMSE isn't.

This is not correct either.

RMSE squares the error, so both negative and positive values will end up being the same.

@svpino

15. In summary, the only correct answer to this question is Option 1.

By the way, I write practical tips, break down complex concepts, and regularly publish short quizzes to keep you on your toes.

Follow me @svpino, and let's do this together!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svpino

Santiago

@svpino

Jan 28

Can you guess what their biggest struggle is?

I regularly talk to companies using machine learning, from Fortune-500 to the ice cream parlor in the block around the corner.

Surprisingly, building models is not an issue for them.

Wanna guess? ↓

"Don't worry about the model" is what I usually get.

The real struggle? → "What do we do with this Jupyter notebook running the model that we built"?

It's not about building models. It's about making them useful.

Many feel like having a model is the end of the road.

In reality, it is just the beginning.

The fundamental hurdle is understanding what to do with them.

Read 8 tweets

Santiago

@svpino

Jan 24

I built a machine learning model that predicts whether your car will crash today.

And it's 99% accurate!

Here is the secret: ↓

This thread is the answer to this question.

Before getting into the details, let's jump right into the source code of my model:

Read 10 tweets

Santiago

@svpino

Jan 21

Occam's Razor:

Given two solutions with similar characteristics, the simplest and most direct one is the correct answer.

↓

This thread answers the following question:

Option 3 is probably the simplest one to tackle first.

It talks about "the speed of the training process" and relates it to overtraining and overcomplicating results.

A quick training process doesn't necessarily reduce complexity. This option is not correct.

Read 7 tweets

Santiago

@svpino

Jan 21

Three deep learning myths:

1. A lot of math is needed
2. A lot of data is needed
3. An expensive computer is needed

If these are preventing you from starting, reconsider.

(Hat tip to the FastAI Course.)

https://twitter.com/subrata_ind/status/1484517631088803845?s=20

Data Structures and Algorithms are an underrated set of skills for any software professional.

They are definitely very important!

That being said, I don't think they are absolute requirements for deep learning work.

https://twitter.com/subrata_ind/status/1484517631088803845?s=20

https://twitter.com/Richard_thaler1/status/1484516665048309766?s=20

Understanding the math underpinnings of anything you do will definitely open doors for you.

However, stating that you can't do deep learning unless you understand all of the math involved is not a serious statement.

https://twitter.com/Richard_thaler1/status/1484516665048309766?s=20

Read 5 tweets

Santiago

@svpino

Jan 16

Using more features from your data never comes for free.

Let's talk about dimensionality.

↓

2. Two days ago I asked this question.

Let's now analyze each option starting with Option 3 (probably the easiest one we can discard.)

3. Option 3 states that when we cut down the number of features, we need to "make up the difference" by adding more data.

Removing features reduces the number of dimensions in our data.

It concentrates the samples we have in a lower-dimensional space.

Read 12 tweets

Santiago

@svpino

Jan 14

The complexity of turning a Jupyter notebook into a production system is frequently underestimated.

Having a model that performs great on a test set is not the end of the road but just the beginning.

Fortunately, there's something for you here!

↓

2. The productionization of machine learning systems is one of the most critical topics in the industry today.

There's been a lot of progress, and it's getting better, but for the most part, we are just at the beginning of this road.

3. Not only the space is still immature, but it's very fragmented.

Talk to three different teams, and it's very likely they all use different tools, processes, and focus on different aspects of the lifecycle of their systems.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Like this author's thread?