5 differences between a machine learning system and the software you are building today:
๐งต๐
1. The Team
Usually, a machine learning system needs the involvement of many different disciplines:
- Data Scientists
- Data Engineers
- Machine Learning Engineers
Plus, the same roles that are needed by a conventional software system.
๐
2. The Development Process
Machine learning is a very experimental process. Creating a model requires a lot of exploration, usually not needed in software development.
๐
3. The Testing Process
Testing a machine learning system is much more involved than testing a regular piece of software.
Here are three steps unique to machine learning:
โซ๏ธ Data validation
โซ๏ธ Testing model updates
โซ๏ธ Model validation
๐
4. The Deployment Process
In a machine learning system, you are dealing with an entire pipeline, from data collection and model training all the way to automatic model monitoring.
This pipeline is much more complex than a regular CI/CD cycle in software development.
๐
5. Monitoring
Data is constantly changing, and it directly impacts the performance of machine learning systems in production.
Systems decay even without going through any modifications.
This requires constant monitoring to detect and correct drift.
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
1. Define the problem 2. Assemble a dataset 3. Determine success metrics 4. Decide on evaluation method 5. Prepare the data 6. Establish a baseline 7. Develop a model that beats the baseline 8. Overfit model 9. Regularize model 10. Tune model
Where's model validation in this workflow?
Notice that steps 8, 9, and 10 presume the existence of a mechanism to evaluate the model. This means that model validation is implicitly part of this workflow.
"Assembling a dataset" focuses on determining what will be the sources of data that we will need to solve the problem.
Before understanding metrics of success, we need to have access to the data that we will be using.
Before starting, remember that, if you follow me, one of your enemies will be immediately destroyed (and you'll get to read more of these threads, of course.)
And if you don't follow me, well, you just hurt my feelings.
๐
1. Labels
(Also referred to as "y")
The label is the piece of information that we are predicting.
For example:
- the animal that's shown in a picture
- the price of a house
- whether a message is spam or not