Santiago Profile picture
24 Feb, 25 tweets, 9 min read
Let's do a line-by-line analysis of this deep learning model and truly understand what's going on.

This model identifies handwritten digits. It's one of the classic examples of machine learning applied to computer vision.

🧵👇
First of all, we load the MNIST dataset. This dataset contains 28x28 images showing handwritten digits.

This dataset is so popular that Keras built a utility to load it with a single line of code.

The function returns the dataset split into train and test sets.

[2 / 24]
x_train and x_test represent the train and test sets containing the features: the 28x28 matrix representing the image.

If we print both sets' shapes, we will get 60,000 train images and 10,000 test images.

[3 / 24]
y_train and y_test represent the train and test sets containing the target value: a number between 0 and 9 indicating the digit shown in the corresponding image.

Printing the shape will get us 60,000 and 10,000 values, respectively.

[4 / 24]
When dealing with images, we need a tensor with 4 dimensions: batch size, width, height, and color channels.

x_train.shape is (60000, 28, 28). We are missing the fourth dimension, which should be 1, because these images are grayscale.

reshape() will do the work.

[5 / 24]
If you look at the images' pixels, you'll see that they go from 0 to 255.

We never want to work with values that high: they'll get our network's weights to go out of whack.

To avoid this, we normalize the values by dividing them by 255. Now values go from 0 to 1.

[6 / 24]
The target values go from 0 to 9.

To make it easier on our network, we are going to one-hot-encode them.

Basically, we will transform a value like 5, in an array of zeros with a single 1 corresponding to the fifth position:

[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

[7 / 24]
Let's now define our model.

There are several ways to create a model in Keras. This one is called the "Sequential API."

Basically, our model will be a sequence of layers that we will define one by one.

[8 / 24]
A lot is going on with this first line.

First, we define our model's input shape: a 28x28x1 tensor (width, height, channels.) Notice that we don't specify the batch size here.

This is exactly the shape of our train dataset.

[9 / 24]
Then we define our first layer: a Conv2D layer with 32 filters and a 3x3 kernel.

Basically, this layer will generate 32 different representations using the training images.

If interested, I talked more about convolutions here:


[10 / 24]
We also need to define the activation function used for this layer: ReLU.

(I think I talked about ReLU before, but I can't find the thread now.)

Suffice to say: ReLU is very common. You should use ReLU unless you have an excellent reason not to.

[11 / 24]
After our Conv2D layer, we are going to do a 2x2 max pooling.

Without getting into too many details: it's very common to find a MaxPooling2D layer right after a Conv2D.

Its goal is to downsample the amount of information collected by the Conv2D.

[12 / 24]
The Conv2D layer will produce a set of tensors with shape (26, 26, 32):

26x26 because the convolution operation with a 3x3 kernel will discard a pixel on each side of the image (28x28.)

32 is the number of filters that we set up.

[13 / 24]
The MaxPooling2D operation with a pool size of 2x2 will downsample the output of the Conv2D by half.

This means that we will end up with tensors of shape (13, 13, 32).

[14 / 24]
We are now going to flatten the (13, 13, 32) tensors. Basically, we want everything in a continuous list of values.

The Flatten layer will give us back tensors with shape (5408,).

This "magic" number is the result of 13 * 13 * 32.

[15 / 24]
Finally, we will add a couple more Dense layers.

Notice how the output layer has size 10 (one for each of our possible digit values) and a softmax activation.

The softmax ensures we get a probability distribution indicating the most likely digit in the image.

[16 / 24]
After creating our model, we need to compile it.

Here we are using a Stochastic Gradient Descent (SGD) optimizer with 0.01 as the learning rate. You can play with different optimizers to compare the results.

Try Adams and RMSProp, for example.

[17 / 24]
The loss is categorical cross-entropy.

In English: we want to predict a single class for each image.

By adding "accuracy" to the metrics, the training process will record the accuracy as it progresses.

[18 / 24]
Finally, we fit our model. This starts training it. A couple of notes:

▫️ We'll use batches of 32 images at a time.
▫️ We'll run 10 total epochs.

When fit() is done, we have a fully trained model! Check the results in one of the attached images.

[19 / 24]
Let's now test our model.

We will get a random image from the test set, and we will display it on the screen.

Notice that we want the image to come from the test set, which contains data that our model didn't see during training.

[20 / 24]
We can't forget to reshape and normalize the image just like we did before with the entire train set.

This time, we are just doing it for a single image, the one we will use to test the model.

[21 / 24]
Finally, we use the model.predict() function to predict the value of the image.

Remember that our result is a one-hot-encoded vector, so we will take the argmax value (the position with the highest probability) and that will be the result.

[22 / 24]
If you want the code in a format that you can copy, here it is: gist.github.com/svpino/3cb8367…

My recommendation would be to run it in Google Colab.

[23 / 24]
Whole sh*t, this thread took a lot of work! Hopefully, you were able to follow along.

And speaking about following, if you are looking for a constant stream of machine learning-related information, follow me, and let's do this thing together!

🦕

[24 / 24]
This is a great question!

1,875 is the number of batches processed.

We have 60,000 training samples and we are feeding the model batches of 32 samples. 60,000 / 32 = 1,875.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

25 Feb
Today, let's talk about two key data transformations we constantly use in machine learning:

▫️ Label encoding
▫️ One-hot-encoding

But let's not just talk about them, but try to build some intuition about why they are important.

Grab a coffee, and let's start! ☕️🧵👇
Imagine we have a dataset with two features:

▫️ "temperature" — a numeric value.
▫️ "weather" — a string value.

You should feel uncomfortable with this dataset right off the bat: machine learning algorithms usually don't like to work with non-numerical data.

[2 / 15] Image
To set the record straight, some algorithms don't mind non-numerical data.

For example, certain Decision Tree implementations will be fine with the "weather" feature from our example.

But a lot of them can only work with numbers.

[3 / 15]
Read 15 tweets
23 Feb
When you start with machine learning, it's tempting to learn as many different algorithms and methods as possible.

This is not the best approach. This will not make you the best you can be.

[1 / 5] 🧵👇
Instead, focus on understanding the power of representations and getting as good as you can at feature engineering.

Feed garbage to your fancy algorithms and they will give you garbage back. No exceptions.

[2 / 5]
"Representation" is the process of mapping data into useful features.

"Feature engineering" is the process of determining which features might be useful in training a model.

There's a lot of creativity involved here! The time you spend will pay you back in spades.

[3 / 5]
Read 7 tweets
22 Feb
How do you know you're ready to apply for a data science or machine learning job?

▫️ Remove from your resume everything related to your education: schools, tutorials, certificates, etc.

After getting rid of all of that, would somebody hire you by looking at what's left?

🧵👇
If the answer is no, then you aren't ready.

Your primary asset is the experience you bring to the table. If you have none, finding a job will be hard.

I'm not talking about "years" of experience but your ability to find solutions to problems.

👇
Experience is not necessarily related to having a job either.

In fact, a job may become detrimental to your experience because you'll have to work on something specific for too long.

Just focus on solving problems on your own. Then talk about them.

👇
Read 7 tweets
21 Feb
Go to college. Send your kids. Celebrate those that make it happen.

College is a good thing. If you can afford it, do it.

Most people telling you that college sucks went to college. Most people that didn't go whish their kids would.

🧵👇
You won't replace college with YouTube videos, or reading books, or following tutorials.

Some people may. Most people won't.

Yes, the knowledge is all out there, but college is just not about learning new things.

👇
College doesn't guarantee you a job but look at the statistics of median income and unemployment among those that went and those that didn't.

The numbers should tell us something.

👇
Read 8 tweets
20 Feb
25 popular libraries and frameworks for building machine and deep learning applications.

Covering:

▫️ Data analysis and processing
▫️ Visualizations
▫️ Computer Vision
▫️ Natural Language Processing
▫️ Reinforcement Learning
▫️ Optimization

A mega-thread.

🐍 🧵👇
(1 / 25) TensorFlow

TensorFlow is an end-to-end platform for machine learning. It has a comprehensive, flexible ecosystem of tools and libraries to build and deploy machine learning-powered applications.
(2 / 25) Keras

Keras is a highly-productive deep learning interface running on top of TensorFlow. It provides essential abstractions and building blocks for developing and shipping machine learning solutions with high iteration velocity.
Read 20 tweets
19 Feb
I'm sad to watch many developers working 80-hour weeks to get one inch ahead of everyone else.

And yet, they are missing the biggest opportunity of their lives right under their noses.

🧵👇
Today, you don't leap ahead by learning another framework, watching another tutorial, or building another web page.

That's incremental improvement. Important, but not extraordinary.

👇
Hours don't mean anything, and everything you add to your portfolio will be obsolete in the next couple of years.

What's really going to move the needle is the impact of your work. It's how you change and influence those around you.

👇
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!