What do you think will happen after we repeat this cycle too many times?
(11 / 19)
Repeat the cycle too many times, and the model will get really good at acing the evaluation.
Slowly, it will start "overfitting" to the validation set.
At some point, we will get excellent scores that don't truly represent the model's actual performance.
(12 / 19)
You can probably imagine the solution: we need a new validation set.
In practice, we add the old validation set to the training data, and we get a new, fresh validation set.
Remember the teacher giving you the previous year's tests for practice? Same thing.
(13 / 19)
There's something else we do.
We take another portion of the data and set it aside. We call this "test set," and we never look at it during training.
Then we go and train and validate our model until we are happy with it.
(14 / 19)
When we finish, we use the test set for a final, proper evaluation of the model's performance.
The advantage is that the model has never seen this data, neither directly (during training) or indirectly (during validation.)
(15 / 19)
This is the best evaluation to understand the true capabilities of our model.
Right after we use the test set, we never use it again to test the model. We put it back as part of the train set and find more data to test the model in future iterations.
(16 / 19)
I always felt that splitting the original data into multiple parts was arbitrary until I understood its importance.
Hopefully, this thread helps you with this.
To finish, here are a few more notes about this.
(17 / 19)
1. In practice, the size of train, validation, and test sets vary. Think about them around a 60% - 20% - 20% split.
2. There are multiple ways to validate a model. Here I explained a simple split, but there are other techniques like k-fold cross-validation.
(18 / 19)
3. The size of the dataset influences the split and the techniques that I presented here. Some may not be possible without enough data.
4. This thread is not a paper or a scientific presentation. I'm aiming to build intuition among those who are learning this stuff.
(19 / 19)
If you enjoy these attempts to make machine learning a little more intuitive, stay tuned and check out @svpino for more of these threads.
I'm really enjoying the feedback of those who tell me that these explanations hit home for them. Thanks for the feedback!
Thanks to @gusthema for the inspiration to write this thread.
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
"Is it reasonable for someone to dive into machine learning with a shallow knowledge of math?"
โซ๏ธ The short answer is "yes."
โซ๏ธ The more nuanced answer is "it depends."
Let me try and unpack this question for you.
๐งต๐
You can think about machine learning as a spectrum that goes all the way from pure research to engineering.
The more you move towards a research position, the more you can benefit from your math knowledge. If you move in the other direction, you'll get away with less of it.
๐
I have friends that got a Ph.D. and became college professors.
For them, math is an absolute requirement!
Not only are they working on research projects, but they are teaching the next generation of scientists and engineers.