Pau Labarta Bajo Profile picture
Freelance AI/ML @Toptal I build real-world ML products. And then help you do the same.
GollyG ๐Ÿ’™ Profile picture Learning in Public - Coding - DataSci Profile picture 2 subscribed
Apr 8 โ€ข 14 tweets โ€ข 4 min read
Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“ #1 Create your project folder and cd into it Image
Apr 4 โ€ข 10 tweets โ€ข 3 min read
XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

These are the steps to transform raw data into supervised ML data for time-series prediction โ†“ Example

Imagine you work at a ride-sharing app company in NYC as an ML engineer.

You want to help the operations team allocate the fleet of drivers optimally each hour of the day.

The end goal is to maximize revenue
Mar 14 โ€ข 6 tweets โ€ข 2 min read
ML Project Idea ๐Ÿ’ก

Let's find your Twin Celebrity ๐Ÿ”Ž๐Ÿ‘ธ โ†“ Image Here is a full example, with source code, to learn how to build a complete ML app that finds your Twin Celebrity

Clone the code, modify it, and deploy it!
github.com/SaladSlayer00/โ€ฆ
Mar 9 โ€ข 10 tweets โ€ข 3 min read
Training an ML model inside a Jupyter notebook is something every data scientist knows ๐Ÿ‹๏ธ

But do you know how to create a real-world ML service that makes a difference for the company you work for? ๐Ÿ“ˆ

If the answer is NO, this thread is for you ๐Ÿค—๐Ÿงตโ†“ So, what is the difference between model training and ML service? ๐Ÿค”

An ML service is a sequence of processing and storage steps that takes in raw data and outputs predictions that are used by the business to make smarter decisions.

Model training is just one of those steps. Image
Feb 15 โ€ข 10 tweets โ€ข 4 min read
ML Project Idea ๐Ÿ’ก

Let's predict taxi demand in NYC in the next 60 minutes ๐Ÿš•โ†“ Image Business problem ๐Ÿ’ผ

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- in the next hour
- for each taxi zone (e.g. Zone 113 "Lower Manhattan)

Let's do it in 6 steps โ†“ Image
Feb 14 โ€ข 7 tweets โ€ข 2 min read
Wanna build your first real-world ๐—Ÿ๐—Ÿ๐—  ๐—ฝ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜?

Here are ๐Ÿฐ ๐—™๐—ฅ๐—˜๐—˜ ๐—น๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€, to put you on the right track ๐Ÿงตโ†“ 1๏ธโƒฃ ๐—™๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ถ๐—ป๐—ด ๐—ฝ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ ๐—ณ๐—ผ๐—ฟ ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—Ÿ๐—Ÿ๐— ๐˜€

In this lecture you will learn
- What is model fine tuning?
- Why is it useful?
- When to use it?
- How to architect a fine tuning pipeline in a real-world project

Feb 13 โ€ข 18 tweets โ€ข 4 min read
Are you a data scientist using CSV files to store your data?

What if I told you there is a better way?

Can you imagine a

-> lighter ๐Ÿฆ‹
-> faster ๐ŸŽ๏ธ
-> cheaper ๐Ÿ’ธ

file format to save your datasets?

Read this thread so you don't need to imagine anymore ๐Ÿ‘‡๐Ÿพ Image Do not get me wrong. I love CSVs.

You can open them with any text editor, inspect them and share them with others.

They have become the standard file format for datasets in the AI/ML community.

However, they have a little problem...
Feb 12 โ€ข 15 tweets โ€ข 4 min read
Junior data scientists ๐—”๐—ฅ๐—˜ ๐—ง๐—ข๐—Ÿ๐—— the problems they need to solve.
Senior data scientists ๐—™๐—œ๐—ก๐—— the problems that need to be solved.

How?

Here is a step-by-step example โ†“โ†“โ†“ Image In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI):

And you, as a data scientist, need to

1 โ†’ Understand the business problem

2 โ†’ Transform the business problem into a data science problem

3 โ†’ Solve it Image
Feb 8 โ€ข 10 tweets โ€ข 3 min read
3 years ago I struggled to land my first freelance ML engineering contract.

Then I discovered this โ†“ Image Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

Here is what I did, ๐˜€๐˜๐—ฒ๐—ฝ-๐—ฏ๐˜†-๐˜€๐˜๐—ฒ๐—ฝ ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ปโ†“
Jan 17 โ€ข 4 tweets โ€ข 2 min read
XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

Here is a Python library to help you prepare your data โ†“ @joaopcnogueira, one of my students from the Real World ML Tutorial, has built ๐˜๐˜€๐Ÿฎ๐—บ๐—น a Python library that lets you transform

- a time series dataset, into
- a training dataset, with features and targets

Enjoy it

And give it. a star โญ on GitHub โ†“
github.com/joaopcnogueiraโ€ฆ
Jan 11 โ€ข 15 tweets โ€ข 4 min read
Wanna learn how to deploy ML models to production?

Let's go step-by-step, with a real-world example ๐Ÿงตโ†“ Imagine today is your first day as an ML engineer at Uber, and your task is to improve the ML service that predicts the Estimated Time of Arrival (ETA).

The ETA service gives the end users an estimate of when the driver will arrive at the pickup location. Image
Jan 11 โ€ข 10 tweets โ€ข 3 min read
Wanna build a Machine Learning app that solves a real-world business problem?

Here are the 6 steps to get there โ†“โ†“โ†“ Step #1. Understand the business problem

Define

โ†’ the business metric you wanna impact (e.g. user retention) and
โ†’ the ML proxy metric you will optimize (e.g. classification accuracy).

There is nothing more frustrating than building the perfect solution to the wrong problem.
Jan 4 โ€ข 10 tweets โ€ข 4 min read
ML Project Idea ๐Ÿ’ก

Let's predict taxi demand in NYC in the next 60 minutes ๐Ÿš•โ†“ Image Business problem ๐Ÿ’ผ

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- in the next hour
- for each taxi zone (e.g. Zone 113 "Lower Manhattan)

Let's do it in 6 steps โ†“ Image
Jan 3 โ€ข 7 tweets โ€ข 2 min read
Building ML products becomes way easier when you learn these 2 tips

โ†“โ†“โ†“ โžก๏ธ From business problem to ML problem

Every Kaggle competition starts with a clearly defined target metric you need to optimize for.

But, in real-world ML, there is no target metric waiting for you.
Jan 2 โ€ข 10 tweets โ€ข 4 min read
Wanna learn MLOps?

Stop reading blog posts.
Build a prediction service instead ๐Ÿš€

Here is a project you can build (for free) ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“ Let's build a Machine Learning service to predict the Air Quality Index (AQI) in your city in the next 3 days, using a 100% serverless stack.

You will learn a lot, AND you will build something useful for society.

Win-win ๐Ÿ†๐Ÿ†

These are steps to build this โ†“ Image
Jan 2 โ€ข 13 tweets โ€ข 4 min read
Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“ #1 Create your project folder and cd into it Image
Dec 25, 2023 โ€ข 15 tweets โ€ข 4 min read
Junior data scientists ๐—”๐—ฅ๐—˜ ๐—ง๐—ข๐—Ÿ๐—— the problems they need to solve.
Senior data scientists ๐—™๐—œ๐—ก๐—— the problems that need to be solved.

How?

Here is a step-by-step example โ†“โ†“โ†“ Image In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI):

And you, as a data scientist, need to

1 โ†’ Understand the business problem

2 โ†’ Transform the business problem into a data science problem

3 โ†’ Solve it Image
Dec 22, 2023 โ€ข 7 tweets โ€ข 2 min read
Looking for real-world ML project ideas? ๐Ÿ’ก

Let's build an ML system that predicts loan defaults.

With ๐—ณ๐˜‚๐—น๐—น ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ๐Ÿ‘ทโ†“ Kaggle is full of top-notch tutorials on how to train an ML model.

When it comes to loan default prediction, this is probably the best

kaggle.com/code/faressayaโ€ฆ
Dec 21, 2023 โ€ข 11 tweets โ€ข 3 min read
Most ML courses teach you how to build the perfect ML model...
... and only then start thinking about deploying it.

And this is why most ML prototypes in real-world projects do not make it into production.

Is there a better way? ๐Ÿค”

Yes, there is. Let me explain โ†“ Image ๐Ÿ”ฌ Model-first mindset

A model-first mindset is what Kaggle competitions and most online courses are about.

Your ONLY focus is to build the best possible mapping between a set of input features, and a target metric

And in real-world ML this is often not the best approach...
Dec 21, 2023 โ€ข 10 tweets โ€ข 3 min read
3 years ago I struggled to land my first freelance ML engineering contract.

Then I discovered this โ†“ Image Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

Here is what I did, ๐˜€๐˜๐—ฒ๐—ฝ-๐—ฏ๐˜†-๐˜€๐˜๐—ฒ๐—ฝ ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ปโ†“
Dec 16, 2023 โ€ข 9 tweets โ€ข 2 min read
What is an ๐—”/๐—• ๐˜๐—ฒ๐˜€๐˜ and why do you need to master it as an ๐— ๐—Ÿ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ? Image An A/B test is a testing strategy, that helps you decide if an ML model is good enough to be used in production.

But, wait a second.

Isn't a low test error (e.g. low mean absolute error) enough to decide if the model is good or not?

Well, nope.

But why?

Because of 2 reasons