Pau Labarta Bajo Profile picture
Citizen of the World who teaches AI that works | @liquidai | Maths Olympian | Father of 1… sorry 2 | Opinions are my own
Apr 21 6 tweets 3 min read
Advice for AI engineers 💡

Fine-tuning is easy.
Preparing the data for fine-tuning is not.

Here's an example to help you ⬇️ 𝗪𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺?
Fine-tuning is a solved problem.
However, generating the data you need for the fine-tune is not.

For example, if you want to fine-tune an LFM model for a vision task, like LFM2.5-VL-450M to detect wildfire risk factors, you can use a library like leap-finetune. Easy peasy.

However, you first need a good dataset for that. Ups!

This is what I want to talk about here.
github.com/Liquid4All/lea…

𝗪𝗵𝗮𝘁 𝗶𝘀 𝗮 𝗴𝗼𝗼𝗱 𝗱𝗮𝘁𝗮𝘀𝗲𝘁?
A good dataset fulfils 3 criteria:

- It is 𝗮𝗰𝗰𝘂𝗿𝗮𝘁𝗲, meaning the output that corresponds to each input is correct.
- It is 𝗱𝗶𝘃𝗲𝗿𝘀𝗲. Your fine-tune model needs to see the whole input distribution at training time. Otherwise, it will fail when you deploy to production.
- It has 𝗻𝗼 𝘁𝗿𝗮𝗶𝗻/𝘁𝗲𝘀𝘁 𝗰𝗼𝗻𝘁𝗮𝗺𝗶𝗻𝗮𝘁𝗶𝗼𝗻. There cannot be 2 very similar examples where one belongs to the training split and the other belongs to the test split. Otherwise, your model will just memorize the example and inflate your evaluation metrics. Again, when you deploy to production your model performance will drop drop drop.

Now the question is...
Apr 6 11 tweets 3 min read
Everyone is telling you how to fine-tune a model.
Nobody is telling you how to generate the data you need for that.

Let me help you 🧵↓ In the last couple of weeks, I have published two articles that teach you step-by-step how to build a home assistant, that lets you control your house in plain English:

"Hey, close the front door"

And voila. Front door is closed.
Feb 17 7 tweets 2 min read
Advice for AI engineers 💡

Voice assistants do not need cloud models.
A small local model does the job.

Faster. Cheaper.

Here's an example ↓
docs.liquid.ai/examples/lapto… 𝗪𝗵𝗮𝘁'𝘀 𝘁𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺?

Most devs default to API calls, like GPT-5, Gemini 3 and friends.

And this works great when you want to impress your boss, or your next round of investors.

However, it does not move the needle in most businesses.

Why?
Dec 1, 2025 7 tweets 2 min read
I just built a voice assistant that runs speech-to-speech AI with a 1.5B parameter model.

Here's how ↓🧵 Image Most people think "audio model = transcription tool"

But the latest wave of audio models, like LFM2-Audio-1.5B by @liquidai is way more than that:

huggingface.co/LiquidAI/LFM2-…
Aug 19, 2025 16 tweets 3 min read
I used to think the Transformer was the best architecture to build LLMs.
I was wrong. Let me explain ⬇️ Don’t get me wrong. The Transformer is 𝘁𝗵𝗲 𝗺𝗼𝘀𝘁 revolutionary architectural design in the deep learning space for the last 10 years.
May 6, 2025 19 tweets 4 min read
Crash course on Kubernetes for ML Engineers
Hands-on in 9 steps ↓ Kubernetes is one of the hard skills you nonstop find in job descriptions for ML engineers.

Yet, it is one of the tools most ML engineers are scared of.

Let me help you be less scared of Kubernetes, by deploying your first Python app.
Feb 22, 2025 20 tweets 4 min read
Crash course on 𝗞𝘂𝗯𝗲𝗿𝗻𝗲𝘁𝗲𝘀 for ML Engineers
Hands-on in 9 steps ↓ Kubernetes is one of the hard skills you nonstop find in job descriptions for ML engineers.

Yet, it is one of the tools most ML engineers are scared of.

Let me help you be less scared of Kubernetes, by deploying your first Python app.
Jan 15, 2025 5 tweets 2 min read
Wanna learn to 𝗯𝘂𝗶𝗹𝗱 𝗠𝗟 𝘀𝘆𝘀𝘁𝗲𝗺𝘀?

Here are 𝟯 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝗲𝘅𝗮𝗺𝗽𝗹𝗲𝘀 you can build TODAY 👩🏽‍💻👨‍💻↓ 𝗪𝗵𝘆 𝗠𝗟 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝗮𝗻𝗱 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗠𝗟 𝗺𝗼𝗱𝗲𝗹𝘀?
Because ML models are not enough in real-world ML projects.
Until you don't put them to work, by building a

-> Feature pipeline
-> Training pipeline
-> Inference pipeline

they produce 0 business value.
Jan 5, 2025 5 tweets 2 min read
ML Project Idea 💡

Let's predict air quality in Poland 💨🇵🇱↓ Image In this repository, you can find the complete source code of an ML app that

→ predicts air quality (as measured by the PM10 metric) 💨
→ in Poland 🇵🇱
→ for the next 7️⃣ days

Click on this link to see the code ↓
github.com/erno98/ID2223/…
Jan 5, 2025 10 tweets 4 min read
ML Project Idea 💡

Let's predict taxi demand in NYC in the next 60 minutes 🚕↓ Image Business problem 💼

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- in the next hour
- for each taxi zone (e.g. Zone 113 "Lower Manhattan)

Let's do it in 6 steps ↓ Image
Dec 16, 2024 4 tweets 2 min read
ML Project Idea 💡

Let's predict air quality ↓ Image Here is a full example, with source code, to learn how to build a complete ML app that predicts air quality in different European cities.

Clone the code, modify it, and deploy it!
github.com/logicalclocks/…
Dec 15, 2024 18 tweets 4 min read
Are you a data scientist using CSV files to store your data?

What if I told you there is a better way?

Can you imagine a

-> lighter 🦋
-> faster 🏎️
-> cheaper 💸

file format to save your datasets?

Read this thread so you don't need to imagine anymore 👇🏾 Image Do not get me wrong. I love CSVs.

You can open them with any text editor, inspect them and share them with others.

They have become the standard file format for datasets in the AI/ML community.

However, they have a little problem...
Dec 11, 2024 9 tweets 3 min read
3 years ago I struggled to build ML products.

Then I discovered this ↓ Image Unless you are a researcher in academia, and your goal is to publish a paper, you cannot just focus on the ML model you wanna train.

You need to think further down the line and think of the business problem you are trying to solve.

This is the "product-first" mindset.
Oct 22, 2024 15 tweets 4 min read
Let's design an ML system to predict crypto prices, step-by-step ↓🧵 The problem

We want to build a real-time API that serves in real-time short-term predictions on crypto prices.

For example
To predict the price of Ethereum (ETH) in the next 10 seconds.
Oct 20, 2024 7 tweets 2 min read
One skill every ML engineer has to master ↓ 𝗠𝗟 𝗦𝘆𝘀𝘁𝗲𝗺 𝗱𝗲𝘀𝗶𝗴𝗻

Yes. And do you know why?

Because good ML system design hasn't changed at all in the last 5 years.

And it won't.
Oct 13, 2024 15 tweets 3 min read
Every aspiring data scientist I talk to is overwhelmed by the colossal amount of online courses to choose from 🤯

My solution to this problem ↓ Learning is about connecting the dots.

However, it feels like there are too many dots to connect when learning data science.

Too many courses...
Too many blog posts...
Too many technologies...

Solution: You need to change the way you learn.
Sep 22, 2024 8 tweets 2 min read
Time-series are used everywhere

→ At Uber to optimize fleet efficiency
→ At Amazon to forecast inventory levels
→ At every hedge fund to project asset prices.

Still, there is a lack of ML engineers who can build real-world time-series products.

So here is your chance ↓ Let's build a *complete* ML service that forecasts taxi rides in NYC, similar to what Uber does to forecast demand.

The 3 ingredients we need are;

- a dataset
- a Python library to build a good predictive model
- a deployment strategy

For example ↓
Sep 18, 2024 10 tweets 3 min read
3 years ago I struggled to land my first freelance ML engineering contract.

Then I discovered this ↓ Image Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

Here is what I did, 𝘀𝘁𝗲𝗽-𝗯𝘆-𝘀𝘁𝗲𝗽 👩‍💻👨🏽‍💻↓
Sep 18, 2024 12 tweets 3 min read
Let's build an AI Coding assistant with Llama3 ↓🧵🦙 Step 1. Download llama3 with Ollama 🦙

Ollama is an open-source tool to run Large Language Models locally, that you can download for free from here.

ollama.com/download
Aug 13, 2024 15 tweets 4 min read
Let's design an ML system to predict crypto prices, step-by-step ↓🧵 The problem

We want to build a real-time API that serves in real-time short-term predictions on crypto prices.

For example
To predict the price of Ethereum (ETH) in the next 10 seconds.
Aug 1, 2024 15 tweets 4 min read
Let's build a real-time ML system to predict short-term prices
↓↓↓🧵 The problem

We want to build a real-time API that serves in real-time short-term predictions on crypto prices.

For example
To predict the price of Ethereum (ETH) in the next 10 seconds.