THREAD: How is it possible to train a well-performing, advanced Computer Vision model ๐—ผ๐—ป ๐˜๐—ต๐—ฒ ๐—–๐—ฃ๐—จ? ๐Ÿค”

At the heart of this lies the most important technique in modern deep learning - transfer learning.

Let's analyze how it works...

2/ For starters, let's look at what a neural network (NN for short) does.

An NN is like a stack of pancakes, with computation flowing up when we make predictions.

How does it all work?
3/ We show an image to our model.

An image is a collection of pixels. Each pixel is just a bunch of numbers describing its color.

Here is what it might look like for a black and white image
4/ The picture goes into the layer at the bottom.

Each layer performs computation on the image, transforming it and passing it upwards.
5/ By the time the image reaches the uppermost layer, it has been transformed to the point that it now consists of two numbers only.

The outputs of a layer are called activations, and the outputs of the last layer have a special meaning... they are the predictions!
6/ For a NN distinguishing between cats and dogs, when presented with an image of a cat we want the ๐šŒ๐šŠ๐š neuron to light up!

We would like for it to have a high value, and for other activations in the last layer to be small...

So far so good! But what about transfer learning?
7/ Consider the lower levels of our stack of pancakes! This is where the bulk of the computation happens.

We know that these layers evolve during training to become feature detectors.

What do we mean by that?
8/ One layer may have tiny sliding windows that are good at detecting lines.

A layer above might have windows that construct shapes from these lines.

We might have a window light up when it sees a square, another when it sees a colorful blob.
9/ As we move up the stack, the features that windows can detect become more complex, building on the work of the layers below.

Maybe one sliding window will combine lines and detect text... maybe another one will learn to detect faces.

Does all of this sound like a hard task?
10/ Absolutely! A network needs to see a lot of pictures to learn all of that.

But, presumably, once we detect all these lower-level features, we can combine them in a plethora of interesting ways? ๐Ÿค”
11/ We can take all the lines, and the blobs, and the faces, or whatever the lower layers of the network can see, and combine them to predict cats and dogs!

Or trains, planes, and ships. Or blood cell boundaries. Or aneurysms in x-rays. The possibilities are endless!
12/ This is precisely what transfer learning is!

We let researchers, large corporations, spend millions of dollars to train very complex models.

And then we get to build on top of their work! ๐Ÿ˜‡

But so much for the theory. How does it all work in practice?
13/ In our example, we took a pretrained model that was trained on a subset of Imagenet consisting of 1.2 million images across 1000 classes!

The @fastdotai framework downloaded the model for us and removed the top of it (the part responsible for predicting 1 of 1000 classes).
14/ It created a new head for our model, one tailored to the classes in the new dataset.

During training, we kept nearly the entire model frozen, and only trained the uppermost part, making use of all the lower level features that were being detected.

Ingenius! ๐Ÿ˜
15/ The concept of transfer learning, of utilizing a model trained on one task to perform another one, applies to other scenarios as well, including NLP (models that act on text).

We will hopefully get a chance to explore all of them ๐Ÿ™‚
16/ I plan to explain all the concepts in modern AI in a similar fashion, assuming people find this useful ๐Ÿ™‚

If you enjoyed this thread, let me know please and help me reach others who might also be interested ๐Ÿ˜Š๐Ÿ™

And the visualizations of what the layers can detect?
17/ They come from this seminal paper - Visualizing and Understanding Convolutional Networks arxiv.org/abs/1311.2901

Next stop - deciphering how it all works in code and finding ways to further improve our model!

Stay tuned for more ๐Ÿ™‚

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Radek Osmulski

Radek Osmulski Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @radekosmulski

17 Feb
โœ๏ธ not sure if this can be helpful to you, but here is how I configure my Ubuntu desktop after install:

โœ… use scripts to install a bunch of stuff
โœ… install i3 wm
โœ… make font not super tiny

Here is the doc: notion.so/How-I-configurโ€ฆ

More thoughts (+ a demo!) in tweets below ๐Ÿ˜‰
If you do decide to follow through with this approach, you are opening yourself up to a world of hurt ๐Ÿ™‚

But with time, this will allow you to work on your computer faster and in an (IMHO) much more pleasant way.

Here is a teaser (I am not using the mouse for any of this).
Quick howto:

โœ… alt-d and start typing to start a program
โœ… alt-enter opens terminal
โœ… ctrl-f c creates a new tmux pane
โœ… ctrl-f x closes current pane
โœ… ctrl-f n, ctrl-f p, ctrl-f <pane num> switches between panes
โœ… shift-alt q closes current program
Read 4 tweets
12 Feb
THREAD: Sometimes code can seem impenetrable. But there are various ways we can make our life easier.

A common pattern is a function or an object accepting a function. Here is an example.

A ๐šœ๐š™๐š•๐š’๐š๐š๐šŽ๐š› can be a thing that can be called.

But what inputs will it receive? Image
2/ This is not a straightforward question to answer.

But we can learn so much more about what is going on right here in our Jupyter notebook!

Enter ๐šœ๐šŽ๐š_๐š๐š›๐šŠ๐šŒ๐šŽ.

We can manufacture an anonymous function and have ๐šœ๐šŽ๐š_๐š๐š›๐šŠ๐šŒ๐šŽ called from inside the ๐™ณ๐šŠ๐š๐šŠ๐™ฑ๐š•๐š˜๐šŒ๐š”! Image
3/ We can now query the actors.

๐š‚๐š˜๐š–๐šŽ๐š๐š‘๐š’๐š—๐š turns out to be an ๐™ป (a ๐š•๐š’๐šœ๐š with superpowers).

It also seems to consist of ๐™ฟ๐š˜๐šœ๐š’๐šก๐™ฟ๐šŠ๐š๐š‘๐šœ.

So we now know what whatever is passed as ๐šœ๐š™๐š•๐š’๐š๐š๐šŽ๐š› should take.

But we can do better. Image
Read 5 tweets
12 Feb
THREAD: How does machine learning ๐Ÿค– differ from regular programming? ๐Ÿง‘โ€๐Ÿ’ป

In both, we tell the computer ๐—ฒ๐˜…๐—ฎ๐—ฐ๐˜๐—น๐˜† what to do.

But there is one important difference...
2/ In regular programming, we describe each step the computer will take.

In machine learning, we write a program where the computer can alter some parameters based on the training examples.

How does this work?
3/ Our model has a lot of tiny knobs, known as weights or parameters, that control the functioning of the program.

We show the computer a lot of examples with correct labels.

Here is how this can play out...
Read 6 tweets
11 Feb
THREAD: Can you start learning cutting-edge deep learning without specialized hardware? ๐Ÿค–

In this thread, we will train an advanced Computer Vision model on a challenging dataset. ๐Ÿ•๐Ÿˆ Training completes in 25 minutes on my 3yrs old Ryzen 5 CPU.

Let me show you how...
2/ We will train on the challenging Oxford-IIIT Pet Dataset.

It consists of 37 classes, with very few examples (around 200) per class. These breeds are hard to tell apart for machines and humans alike!

Such problems are called fine-grained image classification tasks.
3/ These are all the lines of code that we will need!

Our model trains to a very good accuracy of 92%! This is across 37 classes!

How many people do you know who would be as good at telling dog and cat breeds apart?

Let's reconstruct how we did it...
Read 17 tweets
5 Feb
THREAD: Can you go from being a treasurer to doing cutting edge DL research through the power of the community? โœจ

Sarada Lee (@moodymwlee) is the founder of the Perth ML Group and a Scholar @DataInstituteSF and @Uni_Newcastle.

Her amazing journey began with a selfless act...
2/ In 2016 Sarada founded the Perth ML Group to help others learn.

How can the community support you? ๐Ÿค—

It can...

โœ… help you set up your environments ๐Ÿง‘โ€๐Ÿ’ป
โœ… provide technically-sound answers to challenging questions ๐Ÿ’ก
โœ… make learning more fun! ๐Ÿฅณ
3/ What are some tips for community participation?

โœ… explaining things to others will help you learn ๐Ÿฆ‰
โœ… it's okay to be anxious about sharing your answers publicly - DMs are always an option ๐Ÿ“จ
โœ… experiment with various approaches and learn in a way that suits you best ๐Ÿ’ก
Read 4 tweets
21 Dec 18
This is how little code it takes to implement a siamese net using @fastdotai and @pytorch.

I share this because I continue to be amazed.
Here is a refactored version that will be easier to change
The models above were my naive adaptations of the simple siamese network concept from cs.utoronto.ca/~gkoch/files/mโ€ฆ (screenshot on the left below) to a CNN setting.

On the right is the network from the Learning Deep Convolutional Feature Hierarchies section but using pretrained models
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!