Santiago Profile picture
21 Apr, 10 tweets, 2 min read
Creating a good machine learning model is really sexy. That's what's different and where everyone focuses all of their attention.

But machine learning is much more than that.

A thread with a few thoughts about the real job.

1/9
Machine learning engineers spend a lot of time designing and training new models, but this is just a small fraction of their job.

2/9
In reality, dealing with data and operationalizing models is much more time-consuming and sometimes even harder and more involved than creating the models in the first place.

3/9
The ultimate goal of any project is to provide value, and a model is just a piece of the entire puzzle.

Making that piece useful involves pulling together many different skills that machine learning practitioners bring to the table.

4/9
Let's see some of the things you should expect to find on every project:

1. Define the business case for the problem you need to solve.

2. Determine the success criteria you’ll evaluate to understand whether your solution offers the expected value.
55555
5/9
3. Determine which data you will use based on its availability and usefulness.

4. Come up with a plan to remediate any biases in the existing data.

5. Build a pipeline to capture, analyze, transform, and manage that data.

6/9
6. Design, train, validate, and test any models you need to solve the problem.

7. Glue together and deploy models and components into a comprehensive solution.

8. Assess any biases in the final solution and come up with ways to remediate them.

7/9
9. Monitor the solution to identify whether the model is performing as expected.

10. Design and implement a retraining pipeline to keep the model up to date.

8/9
The list is not comprehensive, but it shows the breadth required to complete a valuable solution that users of the model can directly benefit from.

9/9
Originally, I posted this thread as an article in Medium. Follow me at svpino.medium.com if you enjoy this type of content.

You can also subscribe to my free newsletter for a somewhat fresh perspective on practical machine learning every Friday: digest.underfitted.io.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

22 Apr
A 13-tweet introduction to one of the most basic structures used in machine learning: a tensor.

Understanding how tensors work is fundamental. They aren't complex but working with them may get confusing if you don't understand all the pieces.

Let's solve that today.

↓ 1/13
Three primary attributes define a tensor:

• Rank: Number of axes.
• Shape: Number of dimensions per axis.
• Data type: Type of data contained in it.

↓ 2/13 Image
The rank of a tensor refers to the tensor's number of axes.

Examples:

• Rank of a matrix is 2.
• Rank of a vector is 1.
• Rank of a scalar is 0.

↓ 3/13
Read 14 tweets
21 Apr
700 people have watched "How To Get Started With Machine Learning." 86 have rated it.

Let's celebrate!

• You can buy the course today for $7.
• $0 if you don't like it.
• Back to $15 tomorrow.

gum.co/kBjbC/only7

If you can't afford it, keep reading:

ImageImage
For every copy I sell today, I'll give away one for free.

To apply for the free copy, reply below with why you think this course will help you.

I'll prioritize the best stories I read.

If you want to support my content, like/retweet this thread, so more people see it.
So far, 8 copies sold, and 8 free copies shared.

Thanks for the continuous support! It helps tremendously!
Read 7 tweets
20 Apr
The backbone of my end-to-end machine learning setup:

• A 48-page Field Notes
• Python
• NumPy, Pandas, Matplotlib, OpenCV
• Scikit-Learn, XGBoost
• TensorFlow
• Google Colab, Jupyter, VSCode
• Docker, Flask
• AWS SageMaker
I personally don't use C/C++.

That doesn't mean it's not useful. I know plenty of people in the industry that rely on C/C++ to do their work.

It just means that I personally haven't needed it.

There are a lot of satellite tools that I have to use depending on the project. Kinesis, Airflow, SQS... the list is endless.

I just tried to list the core of what I need, and it rarely varies.

Read 5 tweets
20 Apr
Yesterday, @PrasoonPratham posted a step-by-step guide to solve the Titanic challenge on Kaggle.

I thought it'd be fun to engineer some features that can help build an even better model.

Here are some ideas worth considering.

↓ 1/10
Attached you can find the original set of input variables that come with the data to solve the problem.

We are going to transform some of these into features that should help our model produce better results.

This is what Feature Engineering is all about.

↓ 2/13
Keep in mind that these are just hypotheses that you'll have to try and validate.

Some of these suggestions might not improve the results or could even make the model perform worse.

This is an exercise to try and think creatively about the data we are getting.

↓ 3/13
Read 15 tweets
19 Apr
Is 10 twice as worse as 5? Sometimes it is, but sometimes it's even worse.

This is the question I always ask myself when deciding how to penalize my models.

Read on for more details and a couple of examples:

↓ 1/11
When we are training a machine learning model, we need to compute how different our predictions are from the expected results.

For example, if we predict a house's price as $150,000, but the correct answer is $200,000, our "error" is $50,000.

↓ 2/11
There are multiple ways we can compute this error, but two common choices are:

• RMSE — Root Mean Squared Error
• MAE — Mean Absolute Error

Both of these have different properties that will shine depending on the problem you want to solve.

↓ 3/11
Read 12 tweets
18 Apr
Have you upgraded your project to Python 🐍 3.9 yet?

Read on for some of the new syntax and built-in features in Python that you don't want to miss.

1/5
1. You can now merge dictionaries by using a new operator "|".

See PEP 584 for more information: python.org/dev/peps/pep-0….

↓ 2/5 Image
2. There's another new operator "|=". This one will let you update a dictionary.

See PEP 584 for more information: python.org/dev/peps/pep-0….

↓ 3/5 Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!