One of the neural network's architectures that have outpaced traditional algorithms for image recognition is Convolutional Neural Networks(CNN), a.k.a ConvNets.

Inspired by brain's visual cortex, CNN has become a decent plugin for many computer vision tasks.

More about CNN 👇
CNN is made of 3 main blocks which are:

◆Convolutional layer(s)
◆Pooling layer(s)
◆Fully connected layer(s)

Let's talk about each block.
1. Convolutional layers

The convolution layers are the backbone of the whole CNN. They are used to extract the features in the images using filters.
These filters learn both low-level features such as lines, edges, middle-level features such as ears, noses, and high-level features such as the face.

High-level features are what later become useful during image recognition.

Example of low, middle and high-level features 👇
The process of convolution is that we pass the filter to each pixel in an image, we multiply the corresponding pixels and then we calculate the sum, such sum making the new pixel.

We repeat the process until the filter is slid overall image pixels.
Take a look here 👇to see how convolution is done
More about filters/image kernels, I like how this link explains it 👇

setosa.io/ev/image-kerne…
Most popular deep learning frameworks such as TensorFlow allow you to create the convolutional layer in one line of code.

If you're a TensorFlow guy like me, you know this 👇
The output of the convolutional layer will be high dimensional feature maps and its dimension will depend on the number of filters we have in a layer.

Take an example, if the layer has 32 filters, then you will have 32 feature maps at the output.
The more the filters, the many feature maps you will have and that's not cool sometimes.

How do we reduce the dimensions of feature maps while retaining as much information in the image?

We pool...
2. Pooling layers

Pooling layers are used to compress or shrink the feature maps.

There are various pooling options but to preserve the best part of the images, max-pooling is used. It will reduce the image size by only keeping the highest pixels of the image.
Implementing a pooling layer is very simple too.
The output of the pooling layers is reduced feature maps.

How does the network make sense of what these features represent?
3. Fully connected layers/Dense layers

At the end of the ConvNets, there is always going to be a fully connected layer whose job is to match the produced feature maps from the pooling layer with the exact labels of the original image.
Take an example.

If the input image to a ConvNet layer is a human, the high-level features may be a face, which can be enough to recognize a human.
Once the neural network has learned these different levels of features, they will need to be matched to their labels. That is what fully connected layers do.
A ConvNets may have multiple blocks of convolutional and pooling layers.

The right number of these layers will depend on the scope of the work at hand and the size of the dataset.
For more about CNN, I invite you to check this awesome website: CNN Explainer

poloclub.github.io/cnn-explainer/
To summarize,

Convolutional neural networks are notable as one of the powerful neural network architectures, suitable for image-related tasks.

It is made of three blocks of layers: Convolutional layer, pooling, and fully connected layers.
Some state of the art language architectures such as transformers has also shown good results on image recognition (and more researches will keep going on),

But as far as we know,

ConvNets is the go-to architecture in image recognition tasks.
As a side note, I am very interested in the intersection of language and vision where instead of recognizing an image stop there,

We can also generate the image caption using language models such as LSTMs (Long Short Term Memory).
I shared what I think about LSTMs here 👇

Thank you for reading!

I am actively writing about machine learning techniques, concepts, and ideas.

You can support me by following @Jeande_d
and sharing the first tweet with your friends who are interested in ML content.

More content to come 🙌🏻

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Jean de Nyandwi

Jean de Nyandwi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Jeande_d

21 Jul
In any typical application powered by machine learning,

whether that is a text classifier running in a web browser, or

a face detector running in a mobile phone, its machine learning code(or model) will be or close to 5% of the whole app.

Other 95%...👇 Image
Other 95% includes data, analysis, and software-related things.

A machine learning model being only 5% of the whole application often implies that we should be doing something else beyond tuning models.
Such as:

◆ Building irreproducible data preparation pipelines.
◆ Evaluating properly.
Read 4 tweets
19 Jul
PCA is an unsupervised learning algorithm that is used to reduce the dimension of large datasets.

For such reason, it's commonly known as a dimensional reduction algorithm.

PCA is one of these useful things that is not talked about. But there is a reason 👇
The PCA's ability to reduce the dimension of the dataset motivates other use cases.

Below are some:

◆ To visualize high dimensional datasets, particularly because visualizing such datasets is impractical.
◆ To select the most useful features while getting rid of useless information/redundant features.

But not always, sometimes useful information will be lost too especially if the original data was already good and didn't contain noises.
Read 26 tweets
23 May
Machine Learning has transformed many industries, from banking, healthcare, production, streaming, to autonomous vehicles.

Here are examples of how that is happening👇
🔸A bank or any credit card provider can detect fraud in real-time. Banks can also predict if a given customer requesting a loan will pay it back or not based on their financial history.

2/
🔸A Medical Analyst can diagnose a disease in a handful of minutes, or predict the likelihood or course of diseases or survival rate(Prognosis).
🔸An engineer in a given industry can detect failure or defect on the equipment

3/
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(