Let's do a line-by-line analysis of this deep learning model and truly understand what's going on.
This model identifies handwritten digits. It's one of the classic examples of machine learning applied to computer vision.
🧵👇
First of all, we load the MNIST dataset. This dataset contains 28x28 images showing handwritten digits.
This dataset is so popular that Keras built a utility to load it with a single line of code.
The function returns the dataset split into train and test sets.
[2 / 24]
x_train and x_test represent the train and test sets containing the features: the 28x28 matrix representing the image.
If we print both sets' shapes, we will get 60,000 train images and 10,000 test images.
[3 / 24]
y_train and y_test represent the train and test sets containing the target value: a number between 0 and 9 indicating the digit shown in the corresponding image.
Printing the shape will get us 60,000 and 10,000 values, respectively.
[4 / 24]
When dealing with images, we need a tensor with 4 dimensions: batch size, width, height, and color channels.
x_train.shape is (60000, 28, 28). We are missing the fourth dimension, which should be 1, because these images are grayscale.
reshape() will do the work.
[5 / 24]
If you look at the images' pixels, you'll see that they go from 0 to 255.
We never want to work with values that high: they'll get our network's weights to go out of whack.
To avoid this, we normalize the values by dividing them by 255. Now values go from 0 to 1.
[6 / 24]
The target values go from 0 to 9.
To make it easier on our network, we are going to one-hot-encode them.
Basically, we will transform a value like 5, in an array of zeros with a single 1 corresponding to the fifth position:
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
[7 / 24]
Let's now define our model.
There are several ways to create a model in Keras. This one is called the "Sequential API."
Basically, our model will be a sequence of layers that we will define one by one.
[8 / 24]
A lot is going on with this first line.
First, we define our model's input shape: a 28x28x1 tensor (width, height, channels.) Notice that we don't specify the batch size here.
This is exactly the shape of our train dataset.
[9 / 24]
Then we define our first layer: a Conv2D layer with 32 filters and a 3x3 kernel.
Basically, this layer will generate 32 different representations using the training images.
If interested, I talked more about convolutions here:
We also need to define the activation function used for this layer: ReLU.
(I think I talked about ReLU before, but I can't find the thread now.)
Suffice to say: ReLU is very common. You should use ReLU unless you have an excellent reason not to.
[11 / 24]
After our Conv2D layer, we are going to do a 2x2 max pooling.
Without getting into too many details: it's very common to find a MaxPooling2D layer right after a Conv2D.
Its goal is to downsample the amount of information collected by the Conv2D.
[12 / 24]
The Conv2D layer will produce a set of tensors with shape (26, 26, 32):
26x26 because the convolution operation with a 3x3 kernel will discard a pixel on each side of the image (28x28.)
32 is the number of filters that we set up.
[13 / 24]
The MaxPooling2D operation with a pool size of 2x2 will downsample the output of the Conv2D by half.
This means that we will end up with tensors of shape (13, 13, 32).
[14 / 24]
We are now going to flatten the (13, 13, 32) tensors. Basically, we want everything in a continuous list of values.
The Flatten layer will give us back tensors with shape (5408,).
This "magic" number is the result of 13 * 13 * 32.
[15 / 24]
Finally, we will add a couple more Dense layers.
Notice how the output layer has size 10 (one for each of our possible digit values) and a softmax activation.
The softmax ensures we get a probability distribution indicating the most likely digit in the image.
[16 / 24]
After creating our model, we need to compile it.
Here we are using a Stochastic Gradient Descent (SGD) optimizer with 0.01 as the learning rate. You can play with different optimizers to compare the results.
Try Adams and RMSProp, for example.
[17 / 24]
The loss is categorical cross-entropy.
In English: we want to predict a single class for each image.
By adding "accuracy" to the metrics, the training process will record the accuracy as it progresses.
[18 / 24]
Finally, we fit our model. This starts training it. A couple of notes:
▫️ We'll use batches of 32 images at a time.
▫️ We'll run 10 total epochs.
When fit() is done, we have a fully trained model! Check the results in one of the attached images.
[19 / 24]
Let's now test our model.
We will get a random image from the test set, and we will display it on the screen.
Notice that we want the image to come from the test set, which contains data that our model didn't see during training.
[20 / 24]
We can't forget to reshape and normalize the image just like we did before with the entire train set.
This time, we are just doing it for a single image, the one we will use to test the model.
[21 / 24]
Finally, we use the model.predict() function to predict the value of the image.
Remember that our result is a one-hot-encoded vector, so we will take the argmax value (the position with the highest probability) and that will be the result.
My recommendation would be to run it in Google Colab.
[23 / 24]
Whole sh*t, this thread took a lot of work! Hopefully, you were able to follow along.
And speaking about following, if you are looking for a constant stream of machine learning-related information, follow me, and let's do this thing together!
🦕
[24 / 24]
This is a great question!
1,875 is the number of batches processed.
We have 60,000 training samples and we are feeding the model batches of 32 samples. 60,000 / 32 = 1,875.
25 popular libraries and frameworks for building machine and deep learning applications.
Covering:
▫️ Data analysis and processing
▫️ Visualizations
▫️ Computer Vision
▫️ Natural Language Processing
▫️ Reinforcement Learning
▫️ Optimization
A mega-thread.
🐍 🧵👇
(1 / 25) TensorFlow
TensorFlow is an end-to-end platform for machine learning. It has a comprehensive, flexible ecosystem of tools and libraries to build and deploy machine learning-powered applications.
(2 / 25) Keras
Keras is a highly-productive deep learning interface running on top of TensorFlow. It provides essential abstractions and building blocks for developing and shipping machine learning solutions with high iteration velocity.