To train a Machine Learning model later, you need enough historical data (features, targets) in your Feature Store.
Run the feature script for a range of past dates, to get enough training data.
Step 4: Model training script ๐๏ธ
1 โ fetches historical (features, targets) from the Feature Store.
2 โ trains and evaluate the best ML model possible for this data, e.g. XGBoostRegressor.
3 โ stores the trained model in the Model Registry.
Step 5: Automate execution of the feature script ๐ฐ๏ธ
Create a GitHub action to automatically run the feature script (from step 1) every hour.
GitHub actions are serverless computing power to run your code on a schedule. For free.
Beautiful.
Step 6: Create a web app to show model predictions ๐จ๐ฝโ๐ป
Streamlit is a powerful Python library to develop and deploy web data apps.
Your app
1 โ loads the model and features from the *Feature Store*,
2 โ computes model predictions and shows them on a beautiful UI.
BOOM!
Bonus ๐
You can create another GitHub action to automate the model training script.
Why re-train the model? ๐ค
Because ML model performance decreases over time.
The best way to mitigate this is to regularly re-train the model, like once a week.
Wanna level up in ML/MLOps?
Join my e-mail list and get one article ๐๐๐ฒ๐ฟ๐ ๐ฆ๐ฎ๐๐๐ฟ๐ฑ๐ฎ๐ ๐บ๐ผ๐ฟ๐ป๐ถ๐ป๐ด โ datamachines.xyz/subscribe/
Every week I share real-world Data Science/Machine Learning content.
Follow me @paulabartabajo_ so you do not miss what's coming next.
Wanna help?
Like/Retweet the first tweet below to spread the wisdom โโโ
I used to think the Transformer was the best architecture to build LLMs.
I was wrong. Let me explain โฌ๏ธ
Donโt get me wrong. The Transformer is ๐๐ต๐ฒ ๐บ๐ผ๐๐ revolutionary architectural design in the deep learning space for the last 10 years.
It has scaled
> model size and training budgets
> effective sequence length our models can process and use, and
> conquered every LLM eval dataset benchmark out there.
Wanna learn to ๐ฏ๐๐ถ๐น๐ฑ ๐ ๐ ๐๐๐๐๐ฒ๐บ๐?
Here are ๐ฏ ๐ฟ๐ฒ๐ฎ๐น-๐๐ผ๐ฟ๐น๐ฑ ๐ฒ๐ ๐ฎ๐บ๐ฝ๐น๐ฒ๐ you can build TODAY ๐ฉ๐ฝโ๐ป๐จโ๐ปโ
๐ช๐ต๐ ๐ ๐ ๐๐๐๐๐ฒ๐บ๐ ๐ฎ๐ป๐ฑ ๐ป๐ผ๐ ๐ท๐๐๐ ๐ ๐ ๐บ๐ผ๐ฑ๐ฒ๐น๐?
Because ML models are not enough in real-world ML projects.
Until you don't put them to work, by building a
-> Feature pipeline
-> Training pipeline
-> Inference pipeline