The ability to reuse the knowledge of one model and adapt it to solve a different problem is one of the most consequential breakthroughs in machine learning.
Grab your ☕️ and let's talk about this.
🧵👇
A deep learning model is like a Lego set, with many pieces connected, forming a long structure.
These pieces are layers, and each layer has a responsibility.
Although we don't know exactly the role of every layer, we know that the closer they get to the output, the more specific they get.
The best way to understand what I mean is through an example: a model that will process car images.
The top layer (closer to the input) may focus on high-level details of the image. For example, it could focus on extracting edges from the input image.
The next layer may focus on using these edges to form lines.
The layer after that may use the lines to form shapes.
Then, a new layer may focus on specific car components, and the closer you get to the output, the more specific each layer will be.
The output layer will simply decide whether there's a car in the input image.
This is a simplification, but it hopefully illustrates the idea.
Imagine that we trained our car model with 100k images over a few days. A lot of work!
Now it's time to move on and build a new model. But this time we want a model specific to trucks.
There are a couple of problems with this second model:
1. We might not have that many truck images, so the results might not be good.
2. It'll be really shameful to waste all the work we did with the car model because cars and trucks are really similar.
Here is where the magic happens!
Let's get the official definition out of the way: we call this thing you are about to learn "transfer learning."
Transfer learning is a method where we can reuse a model that was developed for one task as the starting point for another task.
Basically, we will reuse most of the stuff our car model learned and transfer it to our truck model.
Think about it: edges, lines, colors, textures... a lot of the knowledge is the same. We will only need to train the new model to recognize the things specific to trucks!
We can disconnect the bottom layers from the car model (the ones closer to the output) because we know they learned things specific to cars.
In their place, we will connect fresh new layers that we will train with truck images.
This way, we can reuse a lot of knowledge and focus on training a small portion of the model on what's specific to the new problem.
Because the hard part is already learned, we don't need that much data for this, neither as much training time as the car model took!
Transfer learning is the reason you and I can build a state-of-the-art computer vision model without having to collect thousands of pictures or having to spend a fortune on infrastructure costs.
We stand on the shoulder of giants!
Hey! We can do this shit together!
Stay tuned. More threads like this coming your way!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Especially with deep learning, where you have many layers full of nodes, it's hard to understand the "thinking" of a network because you'll have to reverse-engineer million of float values and try to make sense of them.