Pau Labarta Bajo Profile picture
Sep 13 β€’ 14 tweets β€’ 4 min read
How do you know if your Machine Learning model is "good enough", in a real-world project? πŸ€”

Let me explain ↓
A Machine Learning model in the real world is either:

βœ… Good enough. In this case, the model is deployed, its predictions are used, and add value to the business every day.

❌ Not good enough. You need to improve your model, and if that is not possible, the project is canceled
When developing a model, you use standard metrics to measure how good it is.

Examples:
β†’ Mean Square Error β†’ regression problems.
β†’ Accuracy β†’ classification problems.

These are the metrics you care about in Kaggle.

However, in real-world projects, these are insufficient.
In real-world ML, you need to translate these "abstract" metrics into "business" metrics, that everyone in the team/company understands.

And the goodness of your ML model is judged based on these "business" metrics.

This is the baseline you need to beat.
#example

Imagine you work at Tesla, building the next generation of self-driving cars.

You wanna build a better version of the autopilot system, which decides in real-time what the car should do next. For example:

1. go straight ⬆️
2. turn left ⬅️
3. turn right ➑️
4. stop βœ‹
Imagine you have historical data with labels you can use to train your ML model, in this case, a classifier with 4 possible outcomes.

And you manage to build a model with 99% accuracy.

The question is: is this accuracy "good enough", or not?
To answer this, you need to translate this abstract "accuracy" into something meaningful for the business.

For example, "probability of a car crash"
To greenlight your new autopilot system, the team needs to ensure that the implied "probability of a car crash" is

β†’ lower than the current system's (baseline 1)

β†’ lower than the probability of a crash when a human drives the car (baseline 2)
To map your "abstract" accuracy of 99%, into a meaningful "probability of a car crash", you immerse your ML agent into a simulation engine and let it navigate it as much as possible.

Every time the agent crashes, you record it, and you get your crash metric.
You compare this metric with the 2 baselines, and decide the model is "good enough" when

(your_system_crash < baseline 1) AND (your_system_crash < baseline 2)

If either one of these inequalities does not hold, the model is NOT good enough, and you need to work on it further.
To sum up,

β†’ Real-world ML models are ultimately evaluated in terms of business metrics.

β†’ An ML model is "good enough" when its implied business metric beats the status quo.
Wanna get more real-world ML content?

Subscribe to my newsletter and get for FREE my eBook

"How to become a freelance data scientist"

which has specific advice to help you become a freelance data scientist ↓
datamachines.xyz/subscribe/
Wanna build a real-world ML project and rock your next job interview?

Check the NEW mentorship program ↓
datamachines.xyz/data-science-m…
That's all for now folks.

I hope you find this content useful for your next ML project πŸ’ͺ🏽

Wanna connect? ↓
Follow me @paulabartabajo_

Wanna help?
Like/Retweet the first tweet below to spread the wisdom
↓↓↓

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Sep 1
Every aspiring data scientist I talk to is overwhelmed by the colossal amount of online courses to choose from 🀯

My solution to this problem ↓
Learning is about connecting the dots.

However, it feels like there are too many dots to connect when learning data science.

Too many courses...
Too many blog posts...
Too many technologies...

Solution: You need to change the way you learn.
As a professional data scientist, you are expected to be a problem-solver for the company or institution you work for.

You need to be good at building data science products that solve business problems.

And for that, you don't need to be an expert in Python, for example.
Read 15 tweets
Aug 30
There is one skill every professional data scientist must have, that no online course talks about it πŸ€”

↓
Every aspiring data scientist I talk to thinks their job starts when someone else gives them

β†’ a dataset, and
β†’ a clearly defined metric to optimize for, e.g. accuracy

They are wrong.

Things are slightly more complex in the real world.
In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI).

The data scientist's job is to translate a business problem into the *right* data science problem.

Then solve it.
Read 15 tweets
Aug 24
Wanna become a freelance data scientist? 😎

5 tips to help you become one ↓
#Tip 1: Start small

Clients donΒ΄t look for an all-in-one data scientist, but someone who can solve their SPECIFIC problems.

Identify the things you are already an expert in, e.g.

β†’ Dashboarding with Tableau, or
β†’ ML for computer vision, or
β†’ Scrapping

Apply only for these.
#Tip 2: Build a Minimum Viable Portfolio

Clients want to see real work you have done in the past. They want to see solid proof you can deliver.

Build a small public/private portfolio that focuses on your strengths (from #Tip 1 above).
Read 8 tweets
Aug 23
Most data scientists focus on algorithms.

So they fail.
Data science = code + DATA

You write code to process and understand the data.

However, if the data is bad, there is nothing that will help you.

Garbage in. Garbage out.
You can play with the code as much as you want.

But if the data is not

β†’ sufficient enough
β†’ complete enough
β†’ good enough

... you will fail.
Read 6 tweets
Aug 22
When I talk to aspiring data scientists, they face 1 of 3 challenges
#1 They do not know where to start.

There are tons of educational resources online, that cover all kinds of data science topics. And this can be overwhelming.
Solution: Pick a topic you are interested in, and search for a Github repo or Kaggle notebook that builds a solution around it.

e.g. "Sentiment analysis of Tweets"

Start by reading and copying someone else's work.

Copy from the best and start narrow. Then expand.
Read 8 tweets
Jul 27
Are you a data scientist trying to get a job? πŸ’Ό

What if I told you there is a better way than completing yet another online course? 🀯

Let me share with you the 4-step process I follow, to find new freelance projects in the Data Science world πŸ‘‡πŸΎ
Data science is a hot field. There is a huge demand for this role...
... and increased supply as well.

It is getting harder to land a data science job. The market is getting crowded, and competition is increasing.

The question is: how can I stand out from the crowd? πŸ€”
Most data scientists follow a passive approach to learning.

They read papers, blogs, and Twitter threads and collect online course certificates. They sometimes implement something.

If you wanna differentiate yourself you need to play this game differently.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(