Supervised Learning is probably the most common class of problems that we have all heard about.
We start with a dataset of examples and their corresponding labels (or answers.)
Then we teach a model the mapping between those examples and the corresponding label.
[2 / 19]
The goal of these problems is for a model to generalize from the examples that it sees to later answer similar questions.
There are two main types of Supervised Learning:
▫️ Classification → We predict a class label
▫️ Regression → We predict a numerical label
[3 / 19]
A Supervised Learning Classification example:
Given a dataset with pictures of dogs and their corresponding breed, build a model that determines the breed of a new picture of a dog.
Notice how the goal is to predict a class label (the breed of the dog.)
[4 / 19]
A Supervised Learning Regression example:
Given the characteristics of a group of houses and their market value, build a model that determines the value of a new house.
Notice how the goal is to predict a numerical label (the value of the house.)
[5 / 19]
Unsupervised Learning is about finding relationships in data.
There are no labels involved in this process. We aren't directly teaching the algorithm through labeled examples. We are expecting it to learn from the data itself.
[6 / 19]
An example of Unsupervised Learning:
Given a list of prospective customers, group them into different segments so your marketing department can reach out to them.
Here the algorithm will determine different groups for your customers based on existing relationships.
[7 / 19]
Clustering is the most common example of Unsupervised Learning.
You have probably heard of k-Means as one of the most popular clustering algorithms. Here, "k" represents the number of clusters we want to find.
[8 / 19]
Reinforcement Learning is pretty cool:
An agent interacts with the environment collecting rewards. Based on those observations, the agent learns which actions will optimize the outcome (either maximizing rewards or minimizing penalties.)
[9 / 19]
An example of Reinforcement Learning:
A robot learning its way from point A to point B in a warehouse by walking and exploring the different paths between the two locations.
Every time the robot gets stuck is penalized. When it reaches the goal, it is rewarded.
[10 / 19]
But of course, AlphaZero (Chess) and AlphaGo (Go) are probably two of the most popular Reinforcement Learning implementations.
DeepMind is the company behind all of this research. Check out their website for some really cool articles.
[11 / 19]
In Semi-Supervised Learning, we get a lot of data but only a few labels. Sometimes, even the labels we have are not completely correct.
The goal is to build a solution that takes advantage of all the data we have, including the unlabeled one.
[12 / 19]
A few days ago I posted a thread about Active Learning, a semi-supervised approach.
Check it out if you are looking for more information about one possible way to approach this problem.
Let's talk about how you can build your first machine learning solution.
(And let's make sure we piss off half the industry in the process.)
Grab that ☕️, and let's go! 🧵
Contrary to popular belief, your first attempt at deploying machine learning should not use TensorFlow, PyTorch, Scikit-Learn, or any other fancy machine learning framework or library.
Your first solution should be a bunch of if-then-else conditions.
Regular, ol' conditions make for a great MVP solution to a machine learning wannabe system.
Pair those conditions with a human, and you have your first system in production!
Conditions handle what they can. Humans handle the rest.
I use Google Spreadsheets because it's in the cloud, and it's convenient for me. I don't have Microsoft Office installed, and as long as spreadsheets aren't crazy large, Google has what I need.
Here are the best 10 machine learning threads I posted in February.
They go all the way from beginner-friendly content to a broader dive into specific machine learning concepts and techniques.
I'd love to hear which one is your favorite!
🧵👇
Having to pick only 10 threads is painful. I always struggle to decide what should stay out of the list.
This, however, is a great incentive when I'm writing the content: I have to compete against myself to make sure what I write ends up being part of the list!
[2 / 13]
[Thread 1]
An explanation about three of the most important metrics we use: accuracy, precision, and recall.
More specifically, this thread shows what happens when we focus on the wrong metric using an imbalanced classification problem.
For the first time yesterday, I set up a project using a Development Container in Visual Studio Code and it immediately hit me:
✨ This is the way going forward! 🤯
If you haven't used this yet, here are some thoughts.
👇
The basic idea: you can run your entire development environment inside a container.
Every time you open your project, @code prepares and runs your container.
[2 / 7]
There are several advantages to this:
First of all, your entire team will run exactly the same environment, regardless of their preferred operating system, folder structure, existing libraries, etc.