Tweet

Jean de Nyandwi

2 Dec, 32 tweets, 7 min read

The image you see below is a typical architecture of Convolutional Neural Networks, a.k.a ConvNets.

ConvNets is a neural network type architecture that is mostly used in image recognition tasks.

More about ConvNets 🧵🧵

Image credit: CS 231n

Today, it's a norm to use ConvNets for most recognition tasks such as image classification and object detection.

Although vision transformers are increasingly outperforming most benchmarks in research communities, we probably still have long to go with ConvNets.

To motivate why ConvNets are so powerful, let's think about what happens when we use fully connected networks for image data.

Fully connected networks or densely connected networks are these kinds of networks where units of any layer are all connected to all other units of the next layer.

Fully connected networks require the input image pixels to be flattened or to be in a 1D dimensional vector.

For example, if our input image is a boot (from fashion MNIST) of size 28X28X1, then it must be converted to 784X1.

For Cifar10 with the size of 32X32X3, the input 1D vector must be 3072X1.

Flattening the input images poses some challenges to the network.

As a consequence, the network can not preserve the spatial features of the images. Also, the network ends up having too many parameters which is not an advantage at all.

Now we know why we don't see fully connected layers used for image recognition. It's their inability to learn spatial features from the images.

And that's exactly why we needed to have a network that works well on images. Why? Visual datasets are so abundant nowadays...

@ylecun

The first Convnets architecture was LeNet-5 by @ylecun. LeNet-5 was used for document recognition in 1998.

yann.lecun.com/exdb/publis/pd…

Most architectures that followed looked exactly like LeNet-5 except that they were bigger(like AlexNet that won Imagenet Challenge 2012), or introduced some other tweaks.

And it's after 2012 that people become interested in ConvNets and deep learning in general.

Image: LeNet-5

Now that we understand where this thing started, let's discuss the 3 main components of a typical ConvNets that are:

◆Convolution layers
◆Pooling layers
◆Fully connected layers

1. Convolution layers

Convolution layers are the main part of ConvNets. They contain filters (with learnable weights) for extracting features in images.

ConvNets are also named after a convolution operation as the name implies.

A convolutional layer performs an elementwise dot product between input image pixels and each unique filter.

The output of convolution layers is called activation maps or feature maps.

The below image illustrates the convolution operation well.

Credit: Cs230 Cheatsheet

This one can be much more intuitive.

ConvNets can be trained with gradient descent. The magical thing about them is that they can automatically determine the appropriate filters that are right for the input image.

More concretely, during the training, the filter values are updated (just like weights are in fully connected networks).

At each step of the training, and as images are being fed into ConvNets, the values of each filter are updated slowly towards the values that minimize the loss/cost function.

Before we talk about pooling layers,let's talk about the typical hyperparameters of convolution layer.

A convolution layer has 5 main hyperparameters:

◆A number of filters that is also equivalent to the number of output activation maps. There is no proper guide on how many filters you should have but they are typically doubled layer after a layer like 32, 64, 128, 256.

◆The size of the filter that is usually (3,3) or (5,5) for most tasks. Small sizes are usually better.

◆Stride which denotes the number of pixels that the filter should shift after each convolution operation. A default stride is 1 and it works pretty great.

◆Padding: without padding, the pixels at the borderline of the image can not undergo convolution.

We use padding to conserve those pixels and it leads to better performance. The commonly used padding type is zero padding, where we add zeros at the outer part of the image.

◆The last important hyperparameter is the activation function. A convolution is a linear operation. We need some form of non-linearity to prevent the network from being a linear classifier.

A commonly used activation in Conv layers is ReLU. It works great & it trains faster.

So far we have only been talking about 2D convolution, but there is 1D and 3D convolution as well.

1D convolution is used in time series and text tasks, whereas 3D convolution is used in video recognition and volumetric data such as medical scans.

That's enough about convolution layers. Most deep learning tools have a concise implementation of convolution.

Refer to this image for its implementation in TensorFlow and PyTorch.

2. Pooling layers

Pooling layers are used for shrinking or downsampling the feature maps produced by convolution layers. This is a crucial thing because, in most tasks, we use a large number of filters, and we increase them layer after layer.

In simple terms, pooling layers reduce the complexity of the network while retaining the best parts of the feature maps.

Using a large number of stride in the Conv layers can also downsample the network, but the advantage of pooling is that it doesn't have parameters.

There are two main types of pooling that are max-pooling and average pooling.

Here is how max-pooling is done:

And here is how average pooling is done.

Images: CS 230(links at the end)

3. Fully connected layers

We have extracted features in images, downsampled them to reduce network complexity and hence to train faster, but how do we do the actual recognition?

Fully connected layers are used for classification purposes.

Usually, the number of the fully connected layer is between 1-3. The most important thing to care about here is the last layer.

The number of units in the last layer should be equivalent to the number of classes in classification problems if activated by softmax.

This is the end of the thread that was about convolutional neural networks. To summarize, ConvNets are mostly used in image recognition tasks.

They are made of 3 main components:
◆Convolution layers
◆Pooling layers
◆Fully connected layers

Here are references for used illustrations and for further reading:

◆CNN Explainer: poloclub.github.io/cnn-explainer/

◆CS 230 CNN cheatsheet: stanford.edu/~shervine/teac…

@Jeande_d

Thanks for reading.

I actively write about machine learning with the goal of simplifying things.

If you found the thread helpful, you are welcome to retweet or share it with anyone who you think might benefit from reading it.

Follow @Jeande_d for more ML ideas!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Jeande_d

Jean de Nyandwi

@Jeande_d

3 Dec

https://twitter.com/Jeeva_G/status/1466705828468064259

Hi,

The number of layers depends on the size of the dataset, but there is no way to know the right number of layers although it's somewhere between 1-10 for an initial trial at least.

https://twitter.com/Jeeva_G/status/1466705828468064259

Everything in deep networks is not clearly predefined. It's all experimenting, experimenting, and experimenting.

ReLU activation is a good starting point always. You can later try other non-saturating activations like SeLU, ELU, etc, but avoid using sigmoid or Tanh.

The common pattern in convolution layers is to double the filters, layer after layer, like 16, 32, 64... but again, this is not guaranteed to work well. The size of filters is usually 3X3, or 5X5.

The pooling size is usually 2 by 2.

Read 7 tweets

Jean de Nyandwi

@Jeande_d

29 Nov

How to think about precision and recall:

Precision: What is the percentage of positive predictions that are actually positive?

Recall: What is the percentage of actual positives that were predicted correctly?

🧵🧵

The fewer false positives, the higher the precision. Vice-versa.

The fewer false negatives, the higher the recall. Vice-versa.

How do you increase precision? Reduce false positives.

It can depend on the problem, but generally, that might mean fixing the labels of those negative samples(being predicted as positives) or adding more of them in the training data.

Read 12 tweets

Jean de Nyandwi

@Jeande_d

28 Nov

Machine Learning Weekly HighLights 💡

Made of:

◆3 things from me
◆2 from from others and
◆1 from the community

This week, I explored different object detection libraries, wrote about the hyper-parameter optimization methods, and updated the introduction to machine learning in my complete ML packaged free online book.

I also reached 6000 followers 🎉. Thank you for your support again!

https://twitter.com/Jeande_d/status/1463109826704076801?s=20

#1

Object detection libraries

https://twitter.com/Jeande_d/status/1463109826704076801?s=20

Read 13 tweets

Jean de Nyandwi

@Jeande_d

21 Nov

Machine Learning Weekly Highlights 💡

Made of:

◆2 things from me
◆2 from other creators
◆2+1 from the community

A thread 🧵

This week, I wrote about activation functions and why they are important components of neural networks.

Yesterday, I also wrote about image classification, one of the most important computer vision tasks.

https://twitter.com/Jeande_d/status/1460963761284517896?s=20

#1

Here is the thread about activation functions

https://twitter.com/Jeande_d/status/1460963761284517896?s=20

Read 12 tweets

Jean de Nyandwi

@Jeande_d

20 Nov

Image classification is one of the most common & important computer vision tasks.

In image classification, we are mainly identifying the category of a given image.

Let's talk more about this important task 🧵🧵

Image classification is about recognizing the specific category of the image from different categories.

Take an example: Given an image of a car, can you make a computer program to recognize if the image is a car?

One might ask why we even need to make computers recognize the images. He or she would be right.

Humans have an innate perception system. Identifying or recognizing the objects seems to be a trivial task for us.

But for computers, it's a different story. Why is that?

Read 15 tweets

Jean de Nyandwi

@Jeande_d

17 Nov

Activations functions are one of the most important components of any typical neural network.

What exactly are activation functions, and why do we need to inject them into the neural network?

A thread 🧵🧵

Activations functions are basically mathematical functions that are used to introduce non linearities in the network.

Without an activation function, the neural network would behave like a linear classifier/regressor.

Or simply put, it would only be able to solve linear problems or those kinds of problems where the relationship between input and output can be mapped out easily because input and output change in a proportional manner.

Let me explain what I mean by that...

Read 27 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Jean de Nyandwi

Try unrolling a thread yourself!

More from @Jeande_d

Jean de Nyandwi

Jean de Nyandwi

Jean de Nyandwi

Jean de Nyandwi

Jean de Nyandwi

Jean de Nyandwi

Did Thread Reader help you today?

Like this author's thread?