Seriously though, how the heck can a computer recognize what's in an image?
Grab a coffee ☕️, and let's talk about one of the core ideas that makes this possible.
(I'll try to stay away from the math, I promise.)
👇
If you are a developer, spend a few minutes trying to think about a way to solve this problem:
→ Given an image, you want to build a function that determines whether it shows a person's face.
2/
It gets overwhelming fast, right?
What are you going to do with all of these pixels?
3/
Alright, you get the idea: this is a hard problem to solve, and we can't just develop our way out of it.
So let's talk about machine learning.
More specifically, let's talk about Convolutional Neural Networks.
4/
Well, I'm skipping like 300 layers of complexity here.
We should start talking about neural networks and build from that idea, but that'll be boring, and I'm sure you've heard of them before.
If you want a refresher, here is an amazing video:
5/
Fully connected networks are cool, but convolutional layers transformed the field.
I want to focus on them, so next time somebody mentions "convolution," you know exactly what's going on.
6/
Before getting too technical, let's try to break down the problem in a way that makes the solution a little bit more intuitive.
Understanding an image's contents is not about individual pixels but about the patterns formed by nearby pixels.
7/
For instance, think about Lena's picture attached here.
You get a bunch of pixels that together form the left eye. Another bunch that makes up the right eye. You have the nose, mouth, eyebrows, etc.
Put them together, and you get her face.
8/
Wave your magic wand and imagine you could build a function specializing in detecting each part of the face.
In the end, you run every function, and if you can find every piece, you would flag the image as being a face.
Easy, right?
9/
But, how do we find an eye on a picture?
Well, we could keep breaking the problem into smaller pieces.
There are lines, circles, colors, patterns that together make up an eye. We could build more functions that detect each one of those separately.
10/
See where I'm going here?
We could build hundreds of functions, each one specializing in a specific portion of the face. Then have them look at the entire picture.
We can then put them together like a giant puzzle to determine whether we are looking at a face.
🙃
11/
I'm happy with that idea because I think it makes sense!
But building hundreds of little functions looking for individual patterns in an image is still a huge hurdle.
😬
Where do we start?
12/
Enter the idea of a "filter," a small square matrix that we will move across the image from top left to bottom right.
Every time we do this, we compute a value using a "convolution" operation.
13/
Look at this picture.
A convolution operation is a dot product (element-wise multiplication) between the filter and the input image patch. Then the result is summed to result in a single value.
After doing this, we move the filter over one position and do it again.
14/
Here is the first convolution operation.
It produces a single value (0.2)
After doing this, we will convolve the filter with the next patch from the image and repeat this until we cover the whole picture.
Ok, this is as much math as I want you to endure.
15/
Here's what's cool about this: convolving an image with different filters will produce different outputs!
The attached code uses the filter2d() function from OpenCV to convolve an image with two different filters.
Notice how one of the pictures shows all the horizontal edges, while the other only shows the vertical edges.
Pretty cool, huh?
17/
Even better: since we are convolving each filter with the entire input image, we can detect features regardless of where they are located!
This is a crucial characteristic of Convolutional Neural Networks. Smart people call it "translation invariance."
18/
Quick summary so far:
▫️ We have a bunch of filters
▫️ Each one worries about a specific pattern
▫️ We convolve them with the input image
▫️ They can detect patterns wherever they are
Do you see where this is going?
19/
The functions that we talked about before are just different filters that highlight different patterns from our image!
We can then combine each filter to find larger patterns to uncover whether we have a face.
20/
One more thing: how do we come up with the values that we need for each filter?
Horizontal and vertical edges aren't a big deal, but we will need much more than that to solve our problem.
21/
Here is where the magic happens!
Our network will learn the value of the filters during training!
We'll show it many faces, and the network will come up with useful filters that will help detect faces.
🤯
22/
None of this would be possible without everything you already know about neural networks.
I also didn't talk about other operations that make Convolutional Networks work.
But hopefully, this thread highlights the main idea: convolutions rock!
23/
If you enjoy my attempts to make machine learning a little more intuitive, stay tuned and check out @svpino for more of these threads.
There's no way to tell what specific features the filters will learn.
The expectation is that they'll focus on the face but they may learn useless features as well.
Hence the importance of validating the results and properly curating the dataset.
1. Mojo 🔥 went open-source 2. Claude 3 beats GPT-4 3. $100B supercomputer from MSFT and OpenAI 4. Andrew Ng and Harrison Chase discussed AI Agents 5. Karpathy talked about the future of AI
...
And more.
Here is everything that will keep you up at night:
Mojo 🔥, the programming language that turns Python into a beast, went open-source.
This is a huge step and great news for the Python and AI communities!
With Mojo 🔥 you can write Python code or scale all the way down to metal code. It's fast!
The best real-life Machine Learning program out there:
"I have seen hundreds of courses; this is the best material and depth of knowledge I've seen."
That's what a professional Software Engineer finishing my program said during class. This is the real deal.
I teach a hard-core live class. It's the best program to learn about building production Machine Learning systems.
But it's not a $9.99 online course. It's not about videos or a bunch of tutorials you can read.
This program is different.
It's 14 hours of live sessions where you interact with me, like in any other classroom. It's tough, with 30 quizzes and 30 coding assignments.
Online courses can't compete with that.
I'll teach you pragmatic Machine Learning for Engineers. This is the type of knowledge every company wants to have.
The program's next iteration (Cohort #8) starts on November 6th. The following (Cohort #9) on December 4th.
It will be different from any other class you've ever taken. It will be tough. It will be fun. It's the closest thing to sitting in a classroom.
And for the first time, the next iteration includes an additional 9 hours of pre-recorded materials to help you as much as possible!
You'll learn about Machine Learning in the real world. You'll learn to train, tune, evaluate, register, deploy, and monitor models. You'll learn how to build a system that continually learns and how to test it in production.
You'll get unlimited access to me and the entire community. I'll help you through the course, answer your questions, and help with your code.
You get lifetime access to all past and future sessions. You get access to every course I've created for free. You get access to recordings, job offers, and many people doing the job you want to do.
No monthly payments. Ever.
The link to join is in the attached image and in the following tweet.
The link to join the program:
The cost to join is $385.
November and December are the last two iterations remaining at that price. The cost will go up starting in January 2024.
Today, there are around 800 professionals in the community.ml.school
Live sessions and recordings:
Sessions are live, and I recommend every student to attend if they can.
But we also record every session, and you get access to the recordings. You can watch them whenever you want.
We also have 2 office hours. They are optional but a lot of fun!