Santiago Profile picture
Jan 24 10 tweets 2 min read
I built a machine learning model that predicts whether your car will crash today.

And it's 99% accurate!

Here is the secret: ↓
This thread is the answer to this question.
Before getting into the details, let's jump right into the source code of my model:
The National Safety Council reports that your odds of being in a car crash in the United States are less than 1%.

This means that even my dumb function that always returns False is very accurate!
The problem here is probably obvious by now:

Accuracy is not a good metric when you face a very imbalanced problem.

You can achieve very high accuracy even with a model that does absolutely nothing useful.
Some examples of imbalanced problems:

• Detecting fraudulent transactions
• Classifying spam messages
• Determining if a patient has cancer
Depending on the problem and what do you care about, the following three metrics will give you much more information than the accuracy:

• Recall
• Precision
• F1-Score
As an example, recall is defined as follows: TP / (TP + FN)

For our car crash problem, the recall of the dummy solution is 0 (since we never detect positive examples.)

Clearly, that solution is garbage.
Getting back to the original question, Option 4 is the answer.

All the other three choices are helpful for imbalanced problems.
By the way, I write practical tips, break down complex concepts, and regularly publish short quizzes to keep you on your toes.

Follow me @svpino, and let's do this together!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Jan 21
Occam's Razor:

Given two solutions with similar characteristics, the simplest and most direct one is the correct answer.

This thread answers the following question:
Option 3 is probably the simplest one to tackle first.

It talks about "the speed of the training process" and relates it to overtraining and overcomplicating results.

A quick training process doesn't necessarily reduce complexity. This option is not correct.
Read 7 tweets
Jan 21
Three deep learning myths:

1. A lot of math is needed
2. A lot of data is needed
3. An expensive computer is needed

If these are preventing you from starting, reconsider.

(Hat tip to the FastAI Course.)
Data Structures and Algorithms are an underrated set of skills for any software professional.

They are definitely very important!

That being said, I don't think they are absolute requirements for deep learning work.

Understanding the math underpinnings of anything you do will definitely open doors for you.

However, stating that you can't do deep learning unless you understand all of the math involved is not a serious statement.

Read 5 tweets
Jan 16
Using more features from your data never comes for free.

Let's talk about dimensionality.

2. Two days ago I asked this question.

Let's now analyze each option starting with Option 3 (probably the easiest one we can discard.)
3. Option 3 states that when we cut down the number of features, we need to "make up the difference" by adding more data.

Removing features reduces the number of dimensions in our data.

It concentrates the samples we have in a lower-dimensional space.
Read 12 tweets
Jan 14
The complexity of turning a Jupyter notebook into a production system is frequently underestimated.

Having a model that performs great on a test set is not the end of the road but just the beginning.

Fortunately, there's something for you here!

2. The productionization of machine learning systems is one of the most critical topics in the industry today.

There's been a lot of progress, and it's getting better, but for the most part, we are just at the beginning of this road.
3. Not only the space is still immature, but it's very fragmented.

Talk to three different teams, and it's very likely they all use different tools, processes, and focus on different aspects of the lifecycle of their systems.
Read 7 tweets
Jan 11
Many machine learning courses that target developers want you to start with algebra, calculus, probabilities, ML theory, and only then—if you haven't quit already—you may see some code.

I want you to know there's another way.

2. For me, there's no substitute to seeing things working, trying them out myself, hitting a wall, fixing them, seeing the results.

A hands-on approach engages me in a way pages of theory never will.

And I know many of you reading this are wired just like me.
3. I feel that driving a car is a good analogy.

While understanding some basics are necessary to start driving, you don't need to read the entire manual before jumping behind the wheel.

As long as you practice in empty parking lots and backroads, you'll be fine.
Read 10 tweets
Jan 8
Do you really understand AI?

Only 16% of adults in the United States got a passing grade in a survey created by the Allen Institute for Artificial Intelligence.

Here are the 5 most interesting questions.

Would you get them right?

AI can translate sentences into another language at the level of a human translator.
AI technology can analyze chest X-Rays with equal or better accuracy than a resident-level radiologist.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(