Let's talk about how you can build your first machine learning solution.
(And let's make sure we piss off half the industry in the process.)
Grab that ☕️, and let's go! 🧵
Contrary to popular belief, your first attempt at deploying machine learning should not use TensorFlow, PyTorch, Scikit-Learn, or any other fancy machine learning framework or library.
Your first solution should be a bunch of if-then-else conditions.
Regular, ol' conditions make for a great MVP solution to a machine learning wannabe system.
Pair those conditions with a human, and you have your first system in production!
Conditions handle what they can. Humans handle the rest.
"But, wait! Conditions? How in the world are we going to get any results with that?"
I'm glad you asked. There are three possibilities:
1. Conditions are all you'll ever need. 2. Conditions give you a mediocre baseline. 3. There's nothing you can predict with conditions.
Turns out that to a hammer, everything starts looking like a nail. Avoid this trap.
Do you really need machine learning to solve a problem?
Ask yourself this question 20 times before moving on. You'll be surprised at what you find.
The first rule of machine learning: you may not need machine learning.
Google said it best.
Sometimes you need machine learning to get good results, but a few conditions can give you a lot of benefits.
Pair this with a human, and you have a solid system.
For example, can you predict invalid samples without using machine learning at all? Let humans handle the rest.
And of course, there's the case where there's nothing you can do with simple conditions.
This is usually the case when dealing with unstructured data (images, videos, audio.)
But you can still follow the same "simplicity" principle.
Find out what's the low-hanging fruit and focus on that. Then let humans deal with the hard cases.
Building a model that finds what's wrong in a circuit board is much harder than finding images that aren't circuit boards at all.
Do that instead.
Imagine you need humans reviewing 1,000 pictures every day to decide which are broken circuit boards.
20% of those images aren't even circuit boards.
Your model can trim those images. Now your humans have to deal with 80% of the load.
You just saved 20% of their time!
The power of this idea is in the approach, not in any specific technique.
Try to get an end-to-end solution as soon as possible.
Every time you frame the problem with a human in the loop, you give yourself a huge advantage!
If you try building a system capable of replacing humans from day 1, you'll have a long road ahead.
Grow to that, but don't start there.
As long as you can avoid the "do-the-best" mentality, you should. And most times, you can.
The pragmatic approach that delivers with minimal headaches:
"Build the simplest system that provides value under human supervision."
Everything else is just gravy.
Building machine learning that works outside a notebook is hard. It sucks for me, and probably for you too.
Follow me if you want some company doing this thing!
As soon as the office closes, I come here every day to tell you what I learned. That might help!
I agree with you here. That's the most important point.
I use Google Spreadsheets because it's in the cloud, and it's convenient for me. I don't have Microsoft Office installed, and as long as spreadsheets aren't crazy large, Google has what I need.