Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Pau Labarta Bajo

@paulabartabajo_

Jul 7 • 10 tweets • 4 min read Twitter logo

Read on Twitter

Scrolly

Wanna learn MLOps?

Stop reading blog posts.
Build a prediction service instead 🚀

Here is a project you can build (for free) 👩🏽‍💻👨‍💻↓↓↓

Let's build a Machine Learning service to predict the Air Quality Index (AQI) in your city in the next 3 days, using a 100% serverless stack.

You will learn a lot, AND you will build something useful for society.

Win-win 🏆🏆

These are steps to build this ↓

Step 1: Feature generation script 🐍

1 → fetches raw weather and pollutant data from an external API like

2 → computes features from this raw data (aka model inputs), and targets (aka model outputs)

3 → stores these features in the *Feature Store* https://t.co/72uTTBYnqFaqicn.org/city/barcelona

Step 2: Backfill historical (features, targets) ⏮️

To train a Machine Learning model later, you need enough historical data (features, targets) in your Feature Store.

Run the feature script for a range of past dates, to get enough training data.

Step 4: Model training script 🏋️

1 → fetches historical (features, targets) from the Feature Store.

2 → trains and evaluate the best ML model possible for this data, e.g. XGBoostRegressor.

3 → stores the trained model in the Model Registry.

Step 5: Automate execution of the feature script 🕰️

Create a GitHub action to automatically run the feature script (from step 1) every hour.

GitHub actions are serverless computing power to run your code on a schedule. For free.

Beautiful.

Step 6: Create a web app to show model predictions 👨🏽‍💻

Streamlit is a powerful Python library to develop and deploy web data apps.

Your app

1 → loads the model and features from the *Feature Store*,

2 → computes model predictions and shows them on a beautiful UI.

BOOM!

Bonus 🎁

You can create another GitHub action to automate the model training script.

Why re-train the model? 🤔

Because ML model performance decreases over time.
The best way to mitigate this is to regularly re-train the model, like once a week.

Wanna level up in ML/MLOps?

Join my e-mail list and get one article 𝗘𝘃𝗲𝗿𝘆 𝗦𝗮𝘁𝘂𝗿𝗱𝗮𝘆 𝗺𝗼𝗿𝗻𝗶𝗻𝗴 ↓
datamachines.xyz/subscribe/

https://twitter.com/1408789941040058369/status/1677286924644491267

Every week I share real-world Data Science/Machine Learning content.

Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
Like/Retweet the first tweet below to spread the wisdom ↓↓↓

https://twitter.com/1408789941040058369/status/1677286924644491267

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @paulabartabajo_

Pau Labarta Bajo

@paulabartabajo_

Jul 6

Junior data scientists stay inside Jupyter.

Senior data scientists go beyond...
... so their ML models reach production 🚀

Wanna learn how? ↓

Jupyter notebooks are the most popular environment to develop Machine Learning models.

They are the faster way to

→ add code
→ fix code
→ re-run code

With the hope that every new run will bring better results, and hence a better model.

In real-world projects, you need to re-run notebooks hundreds (if not thousands of times).

And keeping track of all results quickly becomes tedious.

So you slow down your pace.

And get lost in numbers.

Read 10 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 6

Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps 👩🏽‍💻👨‍💻↓↓↓

#1 Create your project folder and cd into it

#2 Create a README file.

This is the first thing anyone visiting your repository will see.
You better have one. And you better make it pretty.

Read 15 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 5

I used to spend hours setting up IAM roles, Docker registries, and EC2 instances, to build ML products

... until I discovered this ↓

You don't need to be an AWS expert to develop real-world ML apps.

There is a faster way, called 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗠𝗟

𝗪𝗵𝗮𝘁 𝗶𝘀 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗠𝗟?

The idea is simple:

You integrate the services you need at the code level, using each service's Python SDK.

Your code is focused on business logic that differentiates your product, so you spend 0 time on the infra you need to run it.

Read 6 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 1

3 years ago I struggled to land my first freelance ML engineering contract.

Then I discovered this ↓

Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

Here is what I did, 𝘀𝘁𝗲𝗽-𝗯𝘆-𝘀𝘁𝗲𝗽 👩‍💻👨🏽‍💻↓

Step 1. Find a real-world problem you are interested in

Working on projects is harder than completing online courses.
But hey, no pain no gain.

It is VERY important you work on a problem you are interested in.
Otherwise, you will quit.

Read 10 tweets

Pau Labarta Bajo

@paulabartabajo_

Jun 29

2 years ago I got tired of developing ML models... that never made it into production.

Then I discovered this ↓

It is best practice in Software Engineering to first build a working MVP (minimal viable product) fast. And then start iterating.

In the case of ML, an MVP is a minimal system that can

- ingest new data
- make predictions on this data, and
- publish these predictions

So, instead of trying to build the perfect model in a notebook, try to build this minimal system.

This is the recipe that will help you build and ship ML.

And not get stuck in the Jupyter-notebook-infinite-development loop.

Read 5 tweets

Pau Labarta Bajo

@paulabartabajo_

Jun 29

Wanna learn with me how to

- train 🏋️
- deploy 🚀
- automate 🪝

a real-time ML system in 10 steps?

Welcome to the 𝗛𝗮𝗻𝗱𝘀-𝗼𝗻 𝗧𝗿𝗮𝗶𝗻 & 𝗗𝗲𝗽𝗹𝗼𝘆 𝗠𝗟 𝗧𝘂𝘁𝗼𝗿𝗶𝗮𝗹 🤗

Here is 𝗦𝘁𝗲𝗽 2, where we generate the training data