Santiago Profile picture
2 Nov, 11 tweets, 2 min read
Here is the story of one of those hidden issues with machine learning models that books don't tell you about.

This happened in real life: ↓
Imagine you are building a computer vision model.

It goes something like this:

1. Load a dataset of images
2. Train a model with those images
3. Export the final model

Pretty standard stuff.
To make it more specific, let's imagine that you are using OpenCV to load the images from the disk.

Something like the attached screenshot.

Nothing fancy here, right?
Your model trains. Your model is excellent. You are ready to deploy your model.

You take it and host it somewhere.

Life is good!

Except...
Your model starts making predictions... and they are terrible.

You weren't expecting this at all.

What's going on? You validated and tested your model, and it did great. Why would predictions be all over the place?

It took a few days to figure this out.
If you have done this before, I'm sure you have a checklist you'd suggest going through.

I checked off every item from mine, and still nothing!

I wasn't expecting the problem that we found.
Here is something that I didn't tell you before:

The operating system we used to train our model was different from the system where we deployed it.

But, why would this be a problem?
Turns out that Open CV relies on the underlying operating system codecs to load compressed images like JPEG files.

So, whenever you call "cv2.imread()", you ask the operating system to load that image for you.

Different codecs return slightly different pixel values.
We were training a model with a set of values and using a different set to make predictions.

The differences were enough to screw up those predictions.

All of this because the operating systems were different!
Yes, of course, Windows was involved.

Linux-based systems seem to handle images the same way, at least in our tests.

The solution: Make sure your training and prediction environments are the same. ← This one now goes into my checklist.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

29 Oct
A step-by-step guide to your first Computer Vision problem and 10 questions you should answer after that.

No math and no fancy degrees. If you can read Python, you can do this.

If this is your first time looking at this type of problems, my goal is for you to get familiar with some of the high-level ideas.

There will be some hand-waving, but don't worry about that. Focus on the process and the big pieces.
Here is a @DeepnoteHQ notebook with the code and the entire documentation.

You can open it and run it yourself step by step:

deepnote.com/@svpino/MNIST-…
Read 10 tweets
26 Oct
Here is a problem for you to solve:

How many total handshakes will happen between 10 different people assuming everyone handshakes everyone else?

Don't start drawing things on paper. There's a simple way to solve this: ↓
Let's talk about "triangular series" really quick:

Here is an example of one: 1 2 3 4 5.

I know because I can organize these numbers in a triangle like the attached image shows.

Each row has an equivalent number of points (*'s).
Triangular series always start with 1. We can use "n" to denote the highest number of the series.

So in our [1 2 3 4 5] example, n = 5.
Read 10 tweets
24 Oct
Full-stack Machine Learning Engineers are becoming one of the hottest commodities out there.
Full-stack machine learning engineer is the person that’s capable of working on the design, implementation, deployment, and maintainance of a machine learning system.
Different people expand or contract the term “Full-Stack” at their convenience.

That’s ok. We don’t need a dictionary to talk about this.

Full-stack is when you can work on end-to-end systems.
Read 7 tweets
22 Oct
What's a machine learning pipeline?

Well, it turns out that many different things classify as "machine learning pipelines."

Here are five of the different "pipelines" you should be aware of: ↓
Our first pipeline: "Data pipeline."

This goes from ingesting the data from its sources to the final destination where we will consume it.

Sometimes, the data pipeline includes transformations of that data. Sometimes it doesn't.

This leads me to the second pipeline.
The second pipeline: "Data transformation pipeline."

"Wait, I thought this was part of the data pipeline?" You are right; sometimes it is. Sometimes it isn't.

Sometimes, you need to separate "general" transformations from use case-specific transformations.
Read 8 tweets
19 Oct
One of the most useful things you can learn:

Greedy algorithms, how they work, and how to solve problems using them.

Here is why they are fundamental: ↓
Greedy algorithms:

• Pretty intuitive to understand
• Easy to come up with them
• A great way to solve many problems

Optimization is the root of all evil. Many times, a greedy solution is all you need to solve a problem.
At each step, a greedy algorithm always makes the best optimal choice.

(Unfortunately, this approach is not always guaranteed to converge to the optimal solution. More about this later.)

Here is an example problem where you could use a greedy algorithm:
Read 7 tweets
15 Oct
If you haven't looked into machine learning yet, you better start now.
I started looking seriously into machine learning around spring of 2015.

The field was very different back then.

Just to give you an idea, the top most popular deep learning frameworks didn't exist:

• TensorFlow was released at the end of 2015
• PyTorch in 2016
In just 5 - 6 years we have gone from "read my paper... it's cool" to "holly shit, look what my phone is doing!"

Machine learning has turned the industry upside down.

We have gone from "that's impossible" to "of course we can!" in record time.
Read 23 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(