One of the neural network's architectures that have outpaced traditional algorithms for image recognition is Convolutional Neural Networks(CNN), a.k.a ConvNets.
Inspired by brain's visual cortex, CNN has become a decent plugin for many computer vision tasks.
The convolution layers are the backbone of the whole CNN. They are used to extract the features in the images using filters.
These filters learn both low-level features such as lines, edges, middle-level features such as ears, noses, and high-level features such as the face.
High-level features are what later become useful during image recognition.
Example of low, middle and high-level features 👇
The process of convolution is that we pass the filter to each pixel in an image, we multiply the corresponding pixels and then we calculate the sum, such sum making the new pixel.
We repeat the process until the filter is slid overall image pixels.
Take a look here 👇to see how convolution is done
More about filters/image kernels, I like how this link explains it 👇
Most popular deep learning frameworks such as TensorFlow allow you to create the convolutional layer in one line of code.
If you're a TensorFlow guy like me, you know this 👇
The output of the convolutional layer will be high dimensional feature maps and its dimension will depend on the number of filters we have in a layer.
Take an example, if the layer has 32 filters, then you will have 32 feature maps at the output.
The more the filters, the many feature maps you will have and that's not cool sometimes.
How do we reduce the dimensions of feature maps while retaining as much information in the image?
We pool...
2. Pooling layers
Pooling layers are used to compress or shrink the feature maps.
There are various pooling options but to preserve the best part of the images, max-pooling is used. It will reduce the image size by only keeping the highest pixels of the image.
Implementing a pooling layer is very simple too.
The output of the pooling layers is reduced feature maps.
How does the network make sense of what these features represent?
3. Fully connected layers/Dense layers
At the end of the ConvNets, there is always going to be a fully connected layer whose job is to match the produced feature maps from the pooling layer with the exact labels of the original image.
Take an example.
If the input image to a ConvNet layer is a human, the high-level features may be a face, which can be enough to recognize a human.
Once the neural network has learned these different levels of features, they will need to be matched to their labels. That is what fully connected layers do.
A ConvNets may have multiple blocks of convolutional and pooling layers.
The right number of these layers will depend on the scope of the work at hand and the size of the dataset.
For more about CNN, I invite you to check this awesome website: CNN Explainer
Convolutional neural networks are notable as one of the powerful neural network architectures, suitable for image-related tasks.
It is made of three blocks of layers: Convolutional layer, pooling, and fully connected layers.
Some state of the art language architectures such as transformers has also shown good results on image recognition (and more researches will keep going on),
But as far as we know,
ConvNets is the go-to architecture in image recognition tasks.
As a side note, I am very interested in the intersection of language and vision where instead of recognizing an image stop there,
We can also generate the image caption using language models such as LSTMs (Long Short Term Memory).
Machine Learning has transformed many industries, from banking, healthcare, production, streaming, to autonomous vehicles.
Here are examples of how that is happening👇
🔸A bank or any credit card provider can detect fraud in real-time. Banks can also predict if a given customer requesting a loan will pay it back or not based on their financial history.
2/
🔸A Medical Analyst can diagnose a disease in a handful of minutes, or predict the likelihood or course of diseases or survival rate(Prognosis).
🔸An engineer in a given industry can detect failure or defect on the equipment
3/