Greensteam subscribed to the idea of doing #MLOps at a reasonable scale.
Seeing the quickly growing number of customers (= ML experiments), they decided to build their MLOps stack from 0 and solve all core problems around it.
Here are some of the issues → solutions:
- 1000s of Jupyter notebooks → git
- Managing dependencies and reproducibility → @Docker
- Dealing with unit tests (in some parts of the model code) that don’t test → running smoke tests
- Different linter versions showing different results locally and in Jenkins → code checks moved into Docker
- Finding parts of the code that unit tests didn’t cover → mypy
- Testing models for multiple datasets of different clients in different scenarios → @argoproj
- Monitoring the results of models, trained on multiple datasets, with different parameters and metrics (and comparing all those model versions) → neptune.ai
- Training a separate model for each vessel type with constantly growing time-series datasets → @FastAPI
Models aren’t intelligent enough to adjust to a changing world unless they’re constantly retrained & updated
You need to monitor them, detect data drift & update the data
To detect data drift, do distribution tests by measuring distribution changes using these distance metrics:
> Basic statistical metrics you could use to test drift between historical and current features are:
- mean/average value,
- standard deviation,
- minimum and maximum values comparison,
- and also correlation.
> For continuous features, you can use divergence and distance tests such as:
- Kullback–Leibler divergence,
- Kolmogorov-Smirnov statistics (widely used),
- Population Stability Index (PSI),
- Hellinger distance,
- and so on.
@LukawskiKacper is joining us next week on #MLOps Live to share his experience and advise on implementing vector search – AMA.
Kacper has almost 15 years of experience in data engineering, ML and software design. As the founder of @AiEmbassy, he has been also actively taking part in AI discussions, especially on similarity learning, vector search, and solving social issues by applying ML methods.
Jump on a live with us to ask him anything about:
- Using vector search vs neural search to build search engines
- Evaluating and comparing vector search engines (both open-source and paid solutions)
- Optimizing the speed and effectiveness of vector search apps
- And more
Step 1/ Challenges faced by the BioAI team while building DeepChain (platform for protein design):
> Experiment logs all over the place
With logs scattered across documents & files, experiments become difficult to manage. Engineers & researchers would take a long time looking for the results, rather than doing the actual research.
–––
Great @pytorchlightnin + Hydra (clean and scalable) template to kickstart any deep learning project by @ukashxukash (and some other contributors).
Main ideas behind it:
-Predefined structure: clean & scalable so that work can easily be extended
-Rapid Experimentation: thanks to hydra command line superpowrs
-Little Boilerplate: thanks to automating pipelines with config instantiation
-Main Configs: specify default training configuration
-Experiment Configs: override chosen hyperparameters
-Workflow: comes down to 4 simple steps
-Experiment Tracking: @TensorBoard, @weights_biases, neptune.ai, @Cometml, @MLflow, @CSVLogger
-Logs: all logs are stored in a dynamically generated folder structure
-& more
“#MLOps standard industry best practices” don’t apply to most #ML teams’ reality.
Why?
Those who write and share best practices are doing ML at a hyper scale.
Those who read and re-share them are doing ML at a reasonable scale.
Companies like Google, Netflix, Uber, and Airbnb are doing an awesome job for the community by sharing their blogs, white papers, and open-sourcing their tools.
But whatever they do, it is shaped (and biased) by THEIR MLOps problems.
Most companies don’t have their problems.
They would love to have their problems, but they don’t.
They operate on a smaller scale & have different (& other) challenges.
And they are the biggest part of the ML industry.
They want to know what’s the best way to do MLOps at their scale, with their resources & limitations