Santiago Profile picture
Feb 9, 2021 27 tweets 7 min read Read on X
Seriously though, how the heck can a computer recognize what's in an image?

Grab a coffee ☕️, and let's talk about one of the core ideas that makes this possible.

(I'll try to stay away from the math, I promise.)

👇
If you are a developer, spend a few minutes trying to think about a way to solve this problem:

→ Given an image, you want to build a function that determines whether it shows a person's face.

2/ Image
It gets overwhelming fast, right?

What are you going to do with all of these pixels?

3/
Alright, you get the idea: this is a hard problem to solve, and we can't just develop our way out of it.

So let's talk about machine learning.

More specifically, let's talk about Convolutional Neural Networks.

4/
Well, I'm skipping like 300 layers of complexity here.

We should start talking about neural networks and build from that idea, but that'll be boring, and I'm sure you've heard of them before.

If you want a refresher, here is an amazing video:

5/ Image
Fully connected networks are cool, but convolutional layers transformed the field.

I want to focus on them, so next time somebody mentions "convolution," you know exactly what's going on.

6/
Before getting too technical, let's try to break down the problem in a way that makes the solution a little bit more intuitive.

Understanding an image's contents is not about individual pixels but about the patterns formed by nearby pixels.

7/
For instance, think about Lena's picture attached here.

You get a bunch of pixels that together form the left eye. Another bunch that makes up the right eye. You have the nose, mouth, eyebrows, etc.

Put them together, and you get her face.

8/ Image
Wave your magic wand and imagine you could build a function specializing in detecting each part of the face.

In the end, you run every function, and if you can find every piece, you would flag the image as being a face.

Easy, right?

9/ Image
But, how do we find an eye on a picture?

Well, we could keep breaking the problem into smaller pieces.

There are lines, circles, colors, patterns that together make up an eye. We could build more functions that detect each one of those separately.

10/
See where I'm going here?

We could build hundreds of functions, each one specializing in a specific portion of the face. Then have them look at the entire picture.

We can then put them together like a giant puzzle to determine whether we are looking at a face.

🙃

11/
I'm happy with that idea because I think it makes sense!

But building hundreds of little functions looking for individual patterns in an image is still a huge hurdle.

😬

Where do we start?

12/
Enter the idea of a "filter," a small square matrix that we will move across the image from top left to bottom right.

Every time we do this, we compute a value using a "convolution" operation.

13/ Image
Look at this picture.

A convolution operation is a dot product (element-wise multiplication) between the filter and the input image patch. Then the result is summed to result in a single value.

After doing this, we move the filter over one position and do it again.

14/ Image
Here is the first convolution operation.

It produces a single value (0.2)

After doing this, we will convolve the filter with the next patch from the image and repeat this until we cover the whole picture.

Ok, this is as much math as I want you to endure.

15/ Image
Here's what's cool about this: convolving an image with different filters will produce different outputs!

The attached code uses the filter2d() function from OpenCV to convolve an image with two different filters.

Code: gist.github.com/svpino/be7ba9b…

16/ Image
Look at the results here.

Notice how one of the pictures shows all the horizontal edges, while the other only shows the vertical edges.

Pretty cool, huh?

17/ Image
Even better: since we are convolving each filter with the entire input image, we can detect features regardless of where they are located!

This is a crucial characteristic of Convolutional Neural Networks. Smart people call it "translation invariance."

18/
Quick summary so far:

▫️ We have a bunch of filters
▫️ Each one worries about a specific pattern
▫️ We convolve them with the input image
▫️ They can detect patterns wherever they are

Do you see where this is going?

19/
The functions that we talked about before are just different filters that highlight different patterns from our image!

We can then combine each filter to find larger patterns to uncover whether we have a face.

20/
One more thing: how do we come up with the values that we need for each filter?

Horizontal and vertical edges aren't a big deal, but we will need much more than that to solve our problem.

21/
Here is where the magic happens!

Our network will learn the value of the filters during training!

We'll show it many faces, and the network will come up with useful filters that will help detect faces.

🤯

22/
None of this would be possible without everything you already know about neural networks.

I also didn't talk about other operations that make Convolutional Networks work.

But hopefully, this thread highlights the main idea: convolutions rock!

23/
If you enjoy my attempts to make machine learning a little more intuitive, stay tuned and check out @svpino for more of these threads.
There's no way to tell what specific features the filters will learn.

The expectation is that they'll focus on the face but they may learn useless features as well.

Hence the importance of validating the results and properly curating the dataset.

Great question!

In this particular case, the resultant images have the same dimensions because filter2d() uses cv2.BORDER_DEFAULT to replicate the border.

But you are right: the result of a pure convolution operation will give us smaller dimensions.

Speaking about patterns and generalization, here is the natural continuation of this thread:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Mar 31
What a week, huh?

1. Mojo 🔥 went open-source
2. Claude 3 beats GPT-4
3. $100B supercomputer from MSFT and OpenAI
4. Andrew Ng and Harrison Chase discussed AI Agents
5. Karpathy talked about the future of AI
...

And more.

Here is everything that will keep you up at night:
Mojo 🔥, the programming language that turns Python into a beast, went open-source.

This is a huge step and great news for the Python and AI communities!

With Mojo 🔥 you can write Python code or scale all the way down to metal code. It's fast!

modular.com/blog/the-next-…
Claude 3 is the best model in the market right now, overtaking GPT-4.

Claude 3 Opus is #1 in the Arena Leaderboard (beating GPT-4.)

Opus is a huge model, but Claude 3 Haiku is cheap and fast. And it's also beating GPT-4 0613!

Read 10 tweets
Mar 13
The batch size is one of the most important parameters when training neural networks.

Here is everything you need to know about the batch size:

1 of 14 Image
I trained two neural networks.

Same architecture, loss, optimizer, learning rate, momentum, epochs, and training data. Almost everything is the same.

Here is a plot of their losses.

Can you guess what the only difference is?

2 of 14 Image
It's the batch size.

The first network uses batch_size = 1.

The loss is noisy and takes a long time to train.

And every time I run it, I get completely different results. It keeps jumping around and never settles on a good solution.

3 of 14 Image
Read 14 tweets
Jan 5
I had an amazing machine learning professor.

The first thing I learned from him was how to interpret learning curves. (Probably one of the best skills I built and refined over the years.)

Let me show you 4 pictures and you'll see how this process flows:

1/5 Image
I trained a neural network. A simple one.

I plotted the model's training loss. As you can see, it's too high.

This network is underfitting. It's not learning.

I need to make the model larger.

2/5 Image
I increased the capacity of the model. The training loss is now low.

The model is not underfitting anymore, but it might be memorizing the data. I don't know yet.

I need to evaluate this model.

3/5 Image
Read 5 tweets
Dec 21, 2023
AI will be one of the most crucial skills for the next 20 years.

If I were starting today, I'd learn these:

• Python
• LLMs
• Retrieval Augmented Generation (RAG)

Here are 40+ free lessons and practical projects on building advanced RAG applications for production:

1/4
This is one of the most comprehensive courses you'll find. It covers all of LangChain and LlamaIndex.

And it's 100% FREE!

@activeloopai, @towards_AI, and @intel Disruptor collaborated with @llama_index to develop it.

Here is the link:

2/4learn.activeloop.ai/courses/rag
This is a practical course.

It focuses on state-of-the-art retrieval strategies for RAG applications in production.

You will solve problems across industries:

• Biomedical
• Legal
• Financial
• E-commerce and others!

Attached you'll see an example agent you'll build.

3/4 Image
Read 4 tweets
Oct 25, 2023
The best real-life Machine Learning program out there:

"I have seen hundreds of courses; this is the best material and depth of knowledge I've seen."

That's what a professional Software Engineer finishing my program said during class. This is the real deal.

I teach a hard-core live class. It's the best program to learn about building production Machine Learning systems.

But it's not a $9.99 online course. It's not about videos or a bunch of tutorials you can read.

This program is different.

It's 14 hours of live sessions where you interact with me, like in any other classroom. It's tough, with 30 quizzes and 30 coding assignments.

Online courses can't compete with that.

I'll teach you pragmatic Machine Learning for Engineers. This is the type of knowledge every company wants to have.

The program's next iteration (Cohort #8) starts on November 6th. The following (Cohort #9) on December 4th.

It will be different from any other class you've ever taken. It will be tough. It will be fun. It's the closest thing to sitting in a classroom.

And for the first time, the next iteration includes an additional 9 hours of pre-recorded materials to help you as much as possible!

You'll learn about Machine Learning in the real world. You'll learn to train, tune, evaluate, register, deploy, and monitor models. You'll learn how to build a system that continually learns and how to test it in production.

You'll get unlimited access to me and the entire community. I'll help you through the course, answer your questions, and help with your code.

You get lifetime access to all past and future sessions. You get access to every course I've created for free. You get access to recordings, job offers, and many people doing the job you want to do.

No monthly payments. Ever.

The link to join is in the attached image and in the following tweet.
Image
The link to join the program:
The cost to join is $385.

November and December are the last two iterations remaining at that price. The cost will go up starting in January 2024.

Today, there are around 800 professionals in the community.ml.school
Live sessions and recordings:

Sessions are live, and I recommend every student to attend if they can.

But we also record every session, and you get access to the recordings. You can watch them whenever you want.

We also have 2 office hours. They are optional but a lot of fun!
Read 8 tweets
Oct 2, 2023
AI is changing how we build software.

A few weeks ago, I talked about using AI for code reviews. Many dismissed the idea, saying AI can't help beyond trivial suggestions.

You are wrong.

Here are a few examples of what you can do with @CodiumAI's open-source pull request agent: Image
Here, the agent generated the description of a pull request.

It looks at every commit and file involved and summarizes what's happening automatically.

You can do this by using the "/describe" command. Image
Sometimes, you need a more thoughtful review of the pull request.

If you want to go deeper, use the "/review" command and have the model generate a full analysis like this.

The tool lets you control which commands will run automatically on every pull request. Image
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(