Santiago Profile picture
29 Oct, 10 tweets, 3 min read
A step-by-step guide to your first Computer Vision problem and 10 questions you should answer after that.

No math and no fancy degrees. If you can read Python, you can do this.

If this is your first time looking at this type of problems, my goal is for you to get familiar with some of the high-level ideas.

There will be some hand-waving, but don't worry about that. Focus on the process and the big pieces.
Here is a @DeepnoteHQ notebook with the code and the entire documentation.

You can open it and run it yourself step by step:

deepnote.com/@svpino/MNIST-…
The notebook is split in three main sections:

1. Loading and transforming the data
2. Building the model
3. Training and testing the model

I want you to get the main idea from each one of these 3 sections.

What happens on each one? How am I accomplishing that?
After you are done going through the code, here are 10 questions I want you to think through.

If you aren't sure about a particular question, go back to the code, change it and explore what happens.

If you are stuck, reply back to this thread and I'll be happy help.
Questions:

1. Why do do you need to reshape the train and test set to add an extra dimension to the vectors?

2. Why do you rescale the pixel values of the images to fit between 0 and 1? Try skipping this step and compare the results.
3. The code uses the target values (y_train) as integers. Modify the code to use one-hot encoded targets instead.

4. What's the difference between the "Sparse Categorical Cross-Entropy" loss function that we are using and "Categorical Cross-Entropy."
5. Why do you use a MaxPooling2D layer right after the Conv2D layer?

6. What happens if we increase the number of filters of the Conv2D layer to 128? What happens if we decrease it to 8?

7. Why do we use a softmax activation on the final layer instead of relu?
8. How does the "momentum" argument affects the SGD optimizer that you are using?

9. How many different parameters will have to be adjusted during the training process of this model?

10. What happens if we modify the batch size that we use to train the model?
Feel free to reply back with your answers or any follow up question you might have. I'll try to get through as many questions as I can.

This is me → @svpino. I write about practical machine learning and try to make things as simple as I can.

Stay tuned for more.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

26 Oct
Here is a problem for you to solve:

How many total handshakes will happen between 10 different people assuming everyone handshakes everyone else?

Don't start drawing things on paper. There's a simple way to solve this: ↓
Let's talk about "triangular series" really quick:

Here is an example of one: 1 2 3 4 5.

I know because I can organize these numbers in a triangle like the attached image shows.

Each row has an equivalent number of points (*'s).
Triangular series always start with 1. We can use "n" to denote the highest number of the series.

So in our [1 2 3 4 5] example, n = 5.
Read 10 tweets
24 Oct
Full-stack Machine Learning Engineers are becoming one of the hottest commodities out there.
Full-stack machine learning engineer is the person that’s capable of working on the design, implementation, deployment, and maintainance of a machine learning system.
Different people expand or contract the term “Full-Stack” at their convenience.

That’s ok. We don’t need a dictionary to talk about this.

Full-stack is when you can work on end-to-end systems.
Read 7 tweets
22 Oct
What's a machine learning pipeline?

Well, it turns out that many different things classify as "machine learning pipelines."

Here are five of the different "pipelines" you should be aware of: ↓
Our first pipeline: "Data pipeline."

This goes from ingesting the data from its sources to the final destination where we will consume it.

Sometimes, the data pipeline includes transformations of that data. Sometimes it doesn't.

This leads me to the second pipeline.
The second pipeline: "Data transformation pipeline."

"Wait, I thought this was part of the data pipeline?" You are right; sometimes it is. Sometimes it isn't.

Sometimes, you need to separate "general" transformations from use case-specific transformations.
Read 8 tweets
19 Oct
One of the most useful things you can learn:

Greedy algorithms, how they work, and how to solve problems using them.

Here is why they are fundamental: ↓
Greedy algorithms:

• Pretty intuitive to understand
• Easy to come up with them
• A great way to solve many problems

Optimization is the root of all evil. Many times, a greedy solution is all you need to solve a problem.
At each step, a greedy algorithm always makes the best optimal choice.

(Unfortunately, this approach is not always guaranteed to converge to the optimal solution. More about this later.)

Here is an example problem where you could use a greedy algorithm:
Read 7 tweets
15 Oct
If you haven't looked into machine learning yet, you better start now.
I started looking seriously into machine learning around spring of 2015.

The field was very different back then.

Just to give you an idea, the top most popular deep learning frameworks didn't exist:

• TensorFlow was released at the end of 2015
• PyTorch in 2016
In just 5 - 6 years we have gone from "read my paper... it's cool" to "holly shit, look what my phone is doing!"

Machine learning has turned the industry upside down.

We have gone from "that's impossible" to "of course we can!" in record time.
Read 23 tweets
12 Oct
A big part of my work is to build computer vision models to recognize things.

It's usually ordinary stuff: An antenna, a fire extinguisher, a bag, a ladder.

Here is a trick I use to solve some of these problems.
The good news about having to recognize everyday objects:

There are a ton of pre-trained models that help with that. You can start with one of these models and get decent results out of the box.

This is important. I'll come back to it in a second.
Many of the use cases that I tackle are about "augmenting" the people who are working with machine learning.

Let's say you have a team looking at drone footage to find squirrels. Eight hours every day looking at images.

This sucks. I can help with that.
Read 19 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(