Santiago Profile picture
Feb 17 8 tweets 2 min read
This is how I split machine learning projects:

1. Project scoping
2. Data definition and preparation
3. Model training and error analysis
4. Deployment, monitoring, and maintenance

Here are 33 questions that most people forget to ask.
"Project scoping":

• What problem are we trying to solve?
• Why does it need to be solved?
• Do we truly need machine learning for this?
• What constraints do we have?
• What are the risks?
• What's the best approach to solving this?
• How do we measure progress?
Still under "Project scoping":

• What does success look like?
• How is our solution going to impact people?
• What could go wrong with our solution?
• What's the simplest version we could build?
"Data definition and preparation":

• What data do we need?
• How are we going to get it?
• How frequently does it change?
• Do we trust the source?
• How is this data biased?
• Can we improve it somehow?
• How are we going to clean it?
• How are we going to augment it?
"Model training and error analysis":

• What's a good baseline?
• What's a good starting point?
• Has anyone solved this before?
• How are we going to test the model?
• Are the results good enough?
• Are we solving the problem?
• How can we improve the results?
"Deployment, monitoring, and maintenance":

• Where do we host this?
• How much do we need to scale?
• What metrics should we monitor?
• What results do we expect?
• How is the model doing?
• How do we keep the model up to date?
• What's our rollback strategy?
Every question opens a new set of possibilities and discoveries.

The more you ask, the better your system will be.

Follow me @svpino, and I'll help you stay curious, one thread at a time, right on your Twitter timeline.
If you build machine learning systems professionally, what questions would you recommend others to start asking?

What questions lead to interesting discoveries with the potential to change the project's outcome?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Feb 8
If you look at this code and think the answer is False, you aren't alone.

Nevertheless, we are all wrong: it returns True.

Read on to see what's happening.

The clue here is the two consecutive operators next to each other.

Operator 1: ==
Operator 2: in

And we have "False" sandwiched in the middle.

The logical reaction is to parse the statement piece by piece. That's what I did.

But that's not how it works.
When it comes to answering this question, there are two camps:

1. Those who claim that "==" takes precedence.

2. Those who claim that "in" takes precedence.

Let's see what we get on both of these cases.
Read 11 tweets
Feb 6
Guess the output and don't cheat.

(This is Python) Image
If you pay careful attention you will realize the equal sign seems strangely elongated.

The reason is because it’s not an equal sign but an ==.

A font ligature causes this effect.

So, no assignment. It’s an equality.
If you are surprised about the answer here, look into “chained comparison” in Python so you can see how it works.
Read 4 tweets
Jan 29
When I started with machine learning, I always made the same mistake:

I confused a couple of metrics that look very similar but are entirely different.

Let's fix that for you.

2. When we train a machine learning model, we need to compute how different our predictions are from the expected results.

For example, if we predict a house's price as $150,000, but the correct answer is $200,000, our "error" is $50,000.
3. There are multiple ways we can compute this error, but two common choices are:

• RMSE — Root Mean Squared Error
• MAE — Mean Absolute Error

These have different properties that will shine depending on the problem you want to solve.
Read 15 tweets
Jan 28
Can you guess what their biggest struggle is?

I regularly talk to companies using machine learning, from Fortune-500 to the ice cream parlor in the block around the corner.

Surprisingly, building models is not an issue for them.

Wanna guess? ↓
"Don't worry about the model" is what I usually get.

The real struggle? → "What do we do with this Jupyter notebook running the model that we built"?

It's not about building models. It's about making them useful.
Many feel like having a model is the end of the road.

In reality, it is just the beginning.

The fundamental hurdle is understanding what to do with them.
Read 8 tweets
Jan 24
I built a machine learning model that predicts whether your car will crash today.

And it's 99% accurate!

Here is the secret: ↓
This thread is the answer to this question.
Before getting into the details, let's jump right into the source code of my model:
Read 10 tweets
Jan 21
Occam's Razor:

Given two solutions with similar characteristics, the simplest and most direct one is the correct answer.

This thread answers the following question:
Option 3 is probably the simplest one to tackle first.

It talks about "the speed of the training process" and relates it to overtraining and overcomplicating results.

A quick training process doesn't necessarily reduce complexity. This option is not correct.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

:(