1. Define the problem 2. Assemble a dataset 3. Determine success metrics 4. Decide on evaluation method 5. Prepare the data 6. Establish a baseline 7. Develop a model that beats the baseline 8. Overfit model 9. Regularize model 10. Tune model
Where's model validation in this workflow?
Notice that steps 8, 9, and 10 presume the existence of a mechanism to evaluate the model. This means that model validation is implicitly part of this workflow.
"Assembling a dataset" focuses on determining what will be the sources of data that we will need to solve the problem.
Before understanding metrics of success, we need to have access to the data that we will be using.
Later, "Preparing the data" focuses on that data.
As soon as you determine what the dataset looks like, “data analysis” is another implicit step that enables several of the steps described in this workflow.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Before starting, remember that, if you follow me, one of your enemies will be immediately destroyed (and you'll get to read more of these threads, of course.)
And if you don't follow me, well, you just hurt my feelings.
😜
1. Labels
(Also referred to as "y")
The label is the piece of information that we are predicting.
For example:
- the animal that's shown in a picture
- the price of a house
- whether a message is spam or not