Pau Labarta Bajo Profile picture
The Real-World ML guy | Learn to build real-world ML apps at https://t.co/xWr8Hm8zI5
GollyG ๐Ÿ’™ Profile picture Robby Zhang ๐Ÿšข Profile picture Learning in Public - Coding - DataSci Profile picture 3 subscribed
Jul 3 โ€ข 11 tweets โ€ข 3 min read
Most of my ML model prototypes never reached production ๐Ÿ˜ตโ€๐Ÿ’ซ

Until I changed my mindset ๐Ÿง โ†“โ†“โ†“ Image ๐Ÿ”ฌ Model-first mindset

A model-first mindset is what Kaggle competitions and most online courses are about.

Your ONLY focus is to build the best possible mapping between a set of input features, and a target metric

And in real-world ML this is often not the best approach...
Jul 2 โ€ข 13 tweets โ€ข 4 min read
How do you build
> ๐—ฟ๐—ฒ๐—ฎ๐—น-๐˜๐—ถ๐—บ๐—ฒ ML systems โšก
> at ๐˜€๐—ฐ๐—ฎ๐—น๐—ฒ ๐ŸŽ›๏ธ
> ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—ฏ๐˜‚๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—ฐ๐—ฎ๐˜€๐—ต ๐Ÿ’ธ?

๐Ÿงตโ†“ ๐—ง๐—ต๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ ๐Ÿค”

Letโ€™s say you work as an ML engineer at a fintech startup, whose flagship product is a mobile app for online payments.

A critical problem you need to tackle from day 0 is the automatic detection of fraudulent transactions.
Jul 1 โ€ข 10 tweets โ€ข 4 min read
ML Project Idea ๐Ÿ’ก

Let's predict taxi demand in NYC in the next 60 minutes ๐Ÿš•โ†“ Image Business problem ๐Ÿ’ผ

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- in the next hour
- for each taxi zone (e.g. Zone 113 "Lower Manhattan)

Let's do it in 6 steps โ†“ Image
Jun 25 โ€ข 6 tweets โ€ข 2 min read
ML Project Idea ๐Ÿ’ก

Let's predict flight delays ๐Ÿ›ฌ โ†“ Image Here is a full example, with source code, to learn how to build a complete ML app that predicts flight delays for Stockholm Arlanda airport.

Clone the code, modify it, and deploy it!
github.com/SebastianoMeneโ€ฆ
Jun 18 โ€ข 17 tweets โ€ข 5 min read
Let's build an LLM agent in Python, step-by-step โ†“๐Ÿงต Why agents ๐Ÿค–โ“

Because Large Language Models alone are not enough to accurately answer complex tasks that require

-> External information that was not present in the training dataset used to fit the LLM paramaters
or
-> Many reasoning steps
Jun 12 โ€ข 6 tweets โ€ข 2 min read
ML Project Idea ๐Ÿ’ก

Let's predict flight delays ๐Ÿ›ฌ โ†“ Image Here is a full example, with source code, to learn how to build a complete ML app that predicts flight delays for Stockholm Arlanda airport.

Clone the code, modify it, and deploy it!
github.com/SebastianoMeneโ€ฆ
Jun 11 โ€ข 4 tweets โ€ข 2 min read
ML Project Idea ๐Ÿ’ก

Let's predict air quality โ†“ Image Here is a full example, with source code, to learn how to build a complete ML app that predicts air quality in different European cities.

Clone the code, modify it, and deploy it!
github.com/logicalclocks/โ€ฆ
Jun 8 โ€ข 9 tweets โ€ข 3 min read
Wanna learn time-series forecasting? ๐Ÿ“ˆ

No more reading blog posts.
It is time to forecast for real ๐Ÿ˜Ž

Here is a project you can build ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿง‘๐Ÿฝโ€๐Ÿ’ปโ†“ Business problem ๐Ÿ’ผ

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- per hour (e.g. tomorrow between 5 PM and 6 PM), and
- per zone (e.g. Zone 113 "Lower Manhattan)

in the following 3 days. Image
Jun 7 โ€ข 10 tweets โ€ข 3 min read
Wanna become an NLP engineer?

Stop taking online courses.
Build something instead ๐Ÿ—๏ธ

Here is a project you can build ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“ Reading blog posts about multi-billion-parameter Language Models is very cool.

However, building real-world NLP products from these models is where the real business value is. And this is what companies look for in the job market.

So, here is a PRO project you can build โ†“
Jun 7 โ€ข 11 tweets โ€ข 3 min read
Tired of training lots of Machine Learning models, and not getting better results? ๐Ÿ˜ตโ€๐Ÿ’ซ

This is how you solve this ๐Ÿง โ†“ A Machine Learning model is the output of a 3-step workflow where you:

1 โ†’ Fetch raw data, for example from an external database.

2 โ†’ Process the data into a tabular format, so you have N features and 1 target.

3 โ†’ Train ML models (e.g. XGBoost) and tune hyper-parameters. Image
Jun 6 โ€ข 14 tweets โ€ข 4 min read
The one skill every professional data scientist must have? ๐Ÿค”

Don't look for it in online courses, it's not there.

Read the thread below and find out โ†“โ†“โ†“ In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI):

And you, as a data scientist, need to

1 โ†’ Understand the business problem

2 โ†’ Transform the business problem into a data science problem

3 โ†’ Solve it Image
May 17 โ€ข 16 tweets โ€ข 4 min read
Let's build an LLM agent in Python, step-by-step โ†“๐Ÿงต Why agents ๐Ÿค–โ“

Because Large Language Models alone are not enough to accurately answer complex tasks that require

-> External information that was not present in the training dataset used to fit the LLM paramaters
or
-> Many reasoning steps
May 15 โ€ข 12 tweets โ€ข 3 min read
How do you test your ML model before deploying it?

3 strategies to help you ๐Ÿง  โ†“ A better offline metric does NOT mean a better model, because

โ†’ An offline metric (e.g test ROC) is *just* a proxy for the actual business metric you care about (e.g money lost in fraudulent transactions)

โ†’ The ML model is just a small bit of the whole ML system in production
May 9 โ€ข 9 tweets โ€ข 3 min read
3 years ago I struggled to build ML products.

Then I discovered this โ†“ Image Unless you are a researcher in academia, and your goal is to publish a paper, you cannot just focus on the ML model you wanna train.

You need to think further down the line and think of the business problem you are trying to solve.

This is the "product-first" mindset.
May 8 โ€ข 13 tweets โ€ข 3 min read
Let's build an AI Coding assistant with Llama3 โ†“๐Ÿงต๐Ÿฆ™ Step 1. Download llama3 with Ollama ๐Ÿฆ™

Ollama is an open-source tool to run Large Language Models locally, that you can download for free from here.

ollama.com/download
Apr 29 โ€ข 13 tweets โ€ข 3 min read
Let's build an AI Coding assistant with Llama3 โ†“๐Ÿงต๐Ÿฆ™ Step 1. Download llama3 with Ollama ๐Ÿฆ™

Ollama is an open-source tool to run Large Language Models locally, that you can download for free from here.

ollama.com/download
Apr 8 โ€ข 14 tweets โ€ข 4 min read
Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“ #1 Create your project folder and cd into it Image
Apr 4 โ€ข 10 tweets โ€ข 3 min read
XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

These are the steps to transform raw data into supervised ML data for time-series prediction โ†“ Example

Imagine you work at a ride-sharing app company in NYC as an ML engineer.

You want to help the operations team allocate the fleet of drivers optimally each hour of the day.

The end goal is to maximize revenue
Mar 14 โ€ข 6 tweets โ€ข 2 min read
ML Project Idea ๐Ÿ’ก

Let's find your Twin Celebrity ๐Ÿ”Ž๐Ÿ‘ธ โ†“ Image Here is a full example, with source code, to learn how to build a complete ML app that finds your Twin Celebrity

Clone the code, modify it, and deploy it!
github.com/SaladSlayer00/โ€ฆ
Mar 9 โ€ข 10 tweets โ€ข 3 min read
Training an ML model inside a Jupyter notebook is something every data scientist knows ๐Ÿ‹๏ธ

But do you know how to create a real-world ML service that makes a difference for the company you work for? ๐Ÿ“ˆ

If the answer is NO, this thread is for you ๐Ÿค—๐Ÿงตโ†“ So, what is the difference between model training and ML service? ๐Ÿค”

An ML service is a sequence of processing and storage steps that takes in raw data and outputs predictions that are used by the business to make smarter decisions.

Model training is just one of those steps. Image
Feb 15 โ€ข 10 tweets โ€ข 4 min read
ML Project Idea ๐Ÿ’ก

Let's predict taxi demand in NYC in the next 60 minutes ๐Ÿš•โ†“ Image Business problem ๐Ÿ’ผ

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- in the next hour
- for each taxi zone (e.g. Zone 113 "Lower Manhattan)

Let's do it in 6 steps โ†“ Image