Let's talk a little bit about machine learning in the real world.
A seemingly simple classification problem that turns ugly quickly.
Hopefully, this gives you an idea of what it takes to put some of the pieces together.
Grab your ☕️, and let's do this thing!
🧵👇
Let's imagine a system where you could sell your stuff.
You submit a bunch of pictures of an object, and the system recommends a price range in which the object could be sold.
Let's focus on classifying the object from the pictures.
An image classification problem sounds simple enough. There are 1,000 examples out there!
Unfortunately, getting value from these systems requires a lot of considerations.
Let me throw a lot of ideas to you. This 🧵 is messy, just like a potential solution to the problem.
You'll send a bunch of pictures of an object. For example, images of your bicycle 🚴.
We need to build a model that classifies that object using all of the pictures.
Let's assume there are 100 possible classes of objects that you can submit.
The first thing we need to keep in mind: customers can send more than one picture.
Some of these pictures will be good. Some of them will be bad.
A "bad" picture will not help us predict the class. Imagine, for instance, a close-up of the tires of the bicycle.
Let's assume we build a classification model that, given a picture, returns the corresponding category (one of the 100 predetermined classes.)
We can use this model with each one of the input pictures.
But what happens if we give the model a bad picture?
Imagine we get 2 pictures from the customer. One good picture and one bad picture. We run both of them through the classification model.
The good picture returns the appropriate class. The bad picture returns the wrong class.
How do we know which class to pick?
We need to avoid showing bad pictures to the model. Duh!
How do we do this?
We could create a new model that classifies pictures in "good" or "bad" and only shows the classification model those that come out as "good."
This new model will help reduce the garbage that we send to the classification model.
But of course, we can't ensure the results will be perfect. We will occasionally send bad pictures to the classification model, and we'll have the same problem that we had before.
More to consider:
Our classification model will return a prediction for each input picture.
We need to determine which result to select as the class of the object.
A way to do this is by "voting": we will pick the class predicted by most pictures.
What happens if we have a tie? For example, what class should we return if one picture returns BICYCLE and the other returns TIRE?
We need to use some heuristic to break that tie. For example, the specific score returned by the classification model.
There may be better ways to do this. For example, how likely are we to see a BICYCLE versus a TIRE?
We can make this likelihood factor into the voting.
Another idea: did the customer provide any sort of text description? We can use it as well!
What happens if the customer sends pictures of an object outside the 100 classes that we support?
Unfortunately, our model will squeeze pictures into one of the 100 classes, no matter what.
This is known as an "out-of-distribution" sample.
Dealing with out-of-distribution samples is painful.
There are several ideas that you could pursue:
▫️ Similarity between the pictures and our train data.
▫️ Likelihood of customer sending a specific object.
▫️ Additional signals that could help.
Another idea? Adding a human in the middle of all of this.
If our model isn't confident enough of a prediction at some point, we could route the request to people, so they make the final decision.
This is powerful stuff.
In the end, our "simple" classification system becomes a combination of several models/steps working together.
They become a "pipeline," where each model's result feeds into the next one until we get to the ultimate prediction.
Hey, but don't fret!
We can do this shit together!
Follow me, and every week I’ll help you navigate this thing from doing it, failing at it, learning, and fixing it.
And for even more in-depth machine learning stories, make sure you head over digest.underfitted.io.
🦕
This is a great question! The answer, however, depends on the specific use case.
If tires are important, and we can only determine the price based on a close-up, we might need to ask customers to send those pictures separately.
This week, I'll be on Twitter Spaces with amazing company!
We'll be talking about some cool machine learning techniques. Each one of us, a different one.
Save the date, and you can join us from your mobile phone right from the Twitter application.
We are planning to record this session, but... But, we will be recording the screen of an iPhone and some other weird stuff to try and get the audio out.
Not the best process, but we will try to get clean audio out of this.
If everything works, @haltakov will make the audio available (likely in the form of a podcast.) Where and how are details that we'll share when we know.
If the audio comes out too crappy, we will probably not bother because it won't be useful for you anyway.
People don't like to use things that change the syntax of their code because "it becomes less readable." I know people that complain vehemently about slicing in Python.