Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Pau Labarta Bajo

@paulabartabajo_

Aug 2 • 11 tweets • 3 min read Twitter logo

Read on Twitter

3 years ago I struggled to land my first freelance ML engineering contract.

Then I discovered this ↓

Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

Here is what I did, 𝘀𝘁𝗲𝗽-𝗯𝘆-𝘀𝘁𝗲𝗽 👩‍💻👨🏽‍💻↓

Step 1. Find a real-world problem you are interested in

Working on projects is harder than completing online courses.
But hey, no pain no gain.

It is VERY important you work on a problem you are interested in.
Otherwise, you will quit.

Step 2. Find a data source

Preferably a live API. If not possible, pick a static dataset from Kaggle.

Here is a superb repo with a list of public APIs you can use
github.com/public-apis/pu…

Step 3. Build a simple ML model

Do not try to build THE PERFECT model, and only then move to the next phase.

Because this leads you to a never-ending Jupyter-notebook-development-cycle, and you get lost.

Start with basic features and a basic model.

And move to the next step.

Step 4. Build a Minimum Viable Product

A Jupyter notebook is not enough to prove your solution might work.
You need to go one step further and build a minimal working system.

I recommend you follow the 3-pipeline design ↓
datamachines.xyz/2023/03/27/rea…

Step 5. Start iterating on the model

Once the system works, start improving it by
- increasing training data size
- increasing the number of features
- trying a more complex ML model
- optimizing model hyper-parameters

Step 6. Push your code to a public GitHub repo and write a beautiful README

The README file is the first thing your future employer will se.

Explain the problem you wanted to solve, and the solution you built.

Here is an example
github.com/Paulescu/bytew…

Ready to take your ML career to the next level?

Join the Real-World ML Tutorial + Community and build a complete ML app, from A to Z.

Because THIS IS what companies look for.

Use the discount code "NINJA" to get a 20% discount and LIFETIME access
realworldmachinelearning.carrd.co

https://twitter.com/1408789941040058369/status/1686649366793109504

Do you like tweets like this?
→ Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
→ Like/Retweet the first tweet below to spread the wisdom ↓↓↓

https://twitter.com/1408789941040058369/status/1686649366793109504

Wanna level up in ML/MLOps?

Join my e-mail list and get one article 𝗘𝘃𝗲𝗿𝘆 𝗦𝗮𝘁𝘂𝗿𝗱𝗮𝘆 𝗺𝗼𝗿𝗻𝗶𝗻𝗴 ↓
datamachines.xyz/subscribe/

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @paulabartabajo_

Pau Labarta Bajo

@paulabartabajo_

Jul 31

How to solve Machine Learning problems in the real world

3 practical tips to make your ML life easier 🧵↓

𝗖𝗼𝗻𝘁𝗲𝘅𝘁

Online courses and Kaggle-style competitions are great resources to learn the fundamentals of ML.

However, the daily job of a machine learning engineer requires an 𝗮𝗱𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗹𝗮𝘆𝗲𝗿 𝗼𝗳 𝘀𝗸𝗶𝗹𝗹𝘀 that you won’t master there.

Here are the 𝘁𝗼𝗽 𝟯 most recurring hidden problems I have faced in my ML life, and my tips for you to deal with them.

Read 18 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 31

I used to waste hours trying to package my Python code...

... until I discovered this tool

Say hello to your new friend, 𝗣𝗼𝗲𝘁𝗿𝘆 👋

Packaging your Python code for easy distribution and reproducibility has never been easier.

Here is how to do it in 4 steps

1️⃣ 𝗗𝗼𝘄𝗻𝗹𝗼𝗮𝗱 𝗮𝗻𝗱 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗣𝗼𝗲𝘁𝗿𝘆

$ curl -sSL | python3 -install.python-poetry.org

Read 7 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 27

What is 𝗺𝗼𝗱𝗲𝗹 𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 and how to implement it?

Hands-on, in 3 steps🧵↓

𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺

No matter how good your predictive Machine Learning model is today, it will eventually expire.

Why?

Because a predictive ML model is essentially a mapping between

→ a set of features (aka inputs)
→ a target (aka output) → what you want to predict

And the thing is, the relationship (aka correlation) between the features and the target can change a lot over time.

This is especially true in problems like recommender systems, or fraud detection.

Read 11 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 26

XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

Here is a Python library to help you prepare your data ↓

@joaopcnogueira, one of my students from the Real World ML Tutorial, has built 𝘁𝘀𝟮𝗺𝗹 a Python library that lets you transform

- a time series dataset, into
- a training dataset, with features and targets

Enjoy it

And give it. a star ⭐ on GitHub ↓
github.com/joaopcnogueira…

Wanna build your first real-world ML system?

Join the Real-World ML Tutorial + Community and get LIFETIME ACCESS to

→ 3 hours of video lectures 🎬
→ Full source code 👨‍💻
→ Discord private community 👨‍👩‍👦

Use code "NINJA" at checkout for a 20% discount

realworldmachinelearning.carrd.co

Read 4 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 26

3 reasons why your XGBoost model does not work

And 3 ways to solve them
↓↓↓

1️⃣ You are overfitting the training data

This is common in highly non-stationary problems, like cryptocurrency price prediction.

Solution. Use cross-validation and hyper-parameter tuning, to adjust the model's bias-variance and get good out-of-sample metrics.

2️⃣ You miss an essential feature in your dataset

You need more/better features to increase the signal-to-noise ratio in your data.

Solution. Pull more raw features and generate better ones through feature engineering.

Read 5 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 25

Advice for ML beginners💡

GitHub actions are *free* computing that makes your life easier.

Here are 3 use cases for ML projects ↓

➡️ Continuous Integration and Deployment (CI/CD)

Machine Learning is software engineering. As such, it is crucial you automate:

→ code updates (aka integration), and
→ code releases to your production environment (aka deployment)

➡️ Batch feature pipelines

This is a program that runs on a chron-like schedule, that fetches raw data from a data source (e.g. a data warehouse), computes ML features, and saves them to a storage service (e.g. a feature store).

Feature pipelines are present in every ML system.

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Pau Labarta Bajo

Try unrolling a thread yourself!

More from @paulabartabajo_

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!