Pau Labarta Bajo Profile picture
Teaching Real World ML at https://t.co/xWr8Hm8zI5 - ML Freelance engineer - Maths Olympian - Father of 1.. sorry 2 kids

Jul 7, 2023, 10 tweets

Wanna learn MLOps?

Stop reading blog posts.
Build a prediction service instead 🚀

Here is a project you can build (for free) 👩🏽‍💻👨‍💻↓↓↓

Let's build a Machine Learning service to predict the Air Quality Index (AQI) in your city in the next 3 days, using a 100% serverless stack.

You will learn a lot, AND you will build something useful for society.

Win-win 🏆🏆

These are steps to build this ↓

Step 1: Feature generation script 🐍

1 → fetches raw weather and pollutant data from an external API like

2 → computes features from this raw data (aka model inputs), and targets (aka model outputs)

3 → stores these features in the *Feature Store* https://t.co/72uTTBYnqFaqicn.org/city/barcelona

Step 2: Backfill historical (features, targets) ⏮️

To train a Machine Learning model later, you need enough historical data (features, targets) in your Feature Store.

Run the feature script for a range of past dates, to get enough training data.

Step 4: Model training script 🏋️

1 → fetches historical (features, targets) from the Feature Store.

2 → trains and evaluate the best ML model possible for this data, e.g. XGBoostRegressor.

3 → stores the trained model in the Model Registry.

Step 5: Automate execution of the feature script 🕰️

Create a GitHub action to automatically run the feature script (from step 1) every hour.

GitHub actions are serverless computing power to run your code on a schedule. For free.

Beautiful.

Step 6: Create a web app to show model predictions 👨🏽‍💻

Streamlit is a powerful Python library to develop and deploy web data apps.

Your app

1 → loads the model and features from the *Feature Store*,

2 → computes model predictions and shows them on a beautiful UI.

BOOM!

Bonus 🎁

You can create another GitHub action to automate the model training script.

Why re-train the model? 🤔

Because ML model performance decreases over time.
The best way to mitigate this is to regularly re-train the model, like once a week.

Wanna level up in ML/MLOps?

Join my e-mail list and get one article 𝗘𝘃𝗲𝗿𝘆 𝗦𝗮𝘁𝘂𝗿𝗱𝗮𝘆 𝗺𝗼𝗿𝗻𝗶𝗻𝗴 ↓
datamachines.xyz/subscribe/

Every week I share real-world Data Science/Machine Learning content.

Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
Like/Retweet the first tweet below to spread the wisdom ↓↓↓

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling