Pau Labarta Bajo Profile picture
Aug 2 11 tweets 3 min read Twitter logo Read on Twitter
3 years ago I struggled to land my first freelance ML engineering contract.

Then I discovered this ↓ Image
Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

Here is what I did, 𝘀𝘁𝗲𝗽-𝗯𝘆-𝘀𝘁𝗲𝗽 👩‍💻👨🏽‍💻↓
Step 1. Find a real-world problem you are interested in

Working on projects is harder than completing online courses.
But hey, no pain no gain.

It is VERY important you work on a problem you are interested in.
Otherwise, you will quit.
Step 2. Find a data source

Preferably a live API. If not possible, pick a static dataset from Kaggle.

Here is a superb repo with a list of public APIs you can use
github.com/public-apis/pu…
Step 3. Build a simple ML model

Do not try to build THE PERFECT model, and only then move to the next phase.

Because this leads you to a never-ending Jupyter-notebook-development-cycle, and you get lost.

Start with basic features and a basic model.

And move to the next step.
Step 4. Build a Minimum Viable Product

A Jupyter notebook is not enough to prove your solution might work.
You need to go one step further and build a minimal working system.

I recommend you follow the 3-pipeline design ↓
datamachines.xyz/2023/03/27/rea…
Step 5. Start iterating on the model

Once the system works, start improving it by
- increasing training data size
- increasing the number of features
- trying a more complex ML model
- optimizing model hyper-parameters
Step 6. Push your code to a public GitHub repo and write a beautiful README

The README file is the first thing your future employer will se.

Explain the problem you wanted to solve, and the solution you built.

Here is an example
github.com/Paulescu/bytew…
Ready to take your ML career to the next level?

Join the Real-World ML Tutorial + Community and build a complete ML app, from A to Z.

Because THIS IS what companies look for.

Use the discount code "NINJA" to get a 20% discount and LIFETIME access
realworldmachinelearning.carrd.co
Do you like tweets like this?
→ Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
→ Like/Retweet the first tweet below to spread the wisdom ↓↓↓
Wanna level up in ML/MLOps?

Join my e-mail list and get one article 𝗘𝘃𝗲𝗿𝘆 𝗦𝗮𝘁𝘂𝗿𝗱𝗮𝘆 𝗺𝗼𝗿𝗻𝗶𝗻𝗴 ↓
datamachines.xyz/subscribe/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Jul 31
How to solve Machine Learning problems in the real world

3 practical tips to make your ML life easier 🧵↓
𝗖𝗼𝗻𝘁𝗲𝘅𝘁

Online courses and Kaggle-style competitions are great resources to learn the fundamentals of ML.

However, the daily job of a machine learning engineer requires an 𝗮𝗱𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗹𝗮𝘆𝗲𝗿 𝗼𝗳 𝘀𝗸𝗶𝗹𝗹𝘀 that you won’t master there.
Here are the 𝘁𝗼𝗽 𝟯 most recurring hidden problems I have faced in my ML life, and my tips for you to deal with them.
Read 18 tweets
Jul 31
I used to waste hours trying to package my Python code...

... until I discovered this tool
Say hello to your new friend, 𝗣𝗼𝗲𝘁𝗿𝘆 👋

Packaging your Python code for easy distribution and reproducibility has never been easier.

Here is how to do it in 4 steps
1️⃣ 𝗗𝗼𝘄𝗻𝗹𝗼𝗮𝗱 𝗮𝗻𝗱 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 𝗣𝗼𝗲𝘁𝗿𝘆

$ curl -sSL | python3 -install.python-poetry.org
Read 7 tweets
Jul 27
What is 𝗺𝗼𝗱𝗲𝗹 𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 and how to implement it?

Hands-on, in 3 steps🧵↓ Image
𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺

No matter how good your predictive Machine Learning model is today, it will eventually expire.

Why?

Because a predictive ML model is essentially a mapping between

→ a set of features (aka inputs)
→ a target (aka output) → what you want to predict
And the thing is, the relationship (aka correlation) between the features and the target can change a lot over time.

This is especially true in problems like recommender systems, or fraud detection.
Read 11 tweets
Jul 26
XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

Here is a Python library to help you prepare your data ↓
@joaopcnogueira, one of my students from the Real World ML Tutorial, has built 𝘁𝘀𝟮𝗺𝗹 a Python library that lets you transform

- a time series dataset, into
- a training dataset, with features and targets

Enjoy it

And give it. a star ⭐ on GitHub ↓
github.com/joaopcnogueira…
Wanna build your first real-world ML system?

Join the Real-World ML Tutorial + Community and get LIFETIME ACCESS to

→ 3 hours of video lectures 🎬
→ Full source code 👨‍💻
→ Discord private community 👨‍👩‍👦

Use code "NINJA" at checkout for a 20% discount

realworldmachinelearning.carrd.co
Read 4 tweets
Jul 26
3 reasons why your XGBoost model does not work

And 3 ways to solve them
↓↓↓ Image
1️⃣ You are overfitting the training data

This is common in highly non-stationary problems, like cryptocurrency price prediction.

Solution. Use cross-validation and hyper-parameter tuning, to adjust the model's bias-variance and get good out-of-sample metrics.
2️⃣ You miss an essential feature in your dataset

You need more/better features to increase the signal-to-noise ratio in your data.

Solution. Pull more raw features and generate better ones through feature engineering.
Read 5 tweets
Jul 25
Advice for ML beginners💡

GitHub actions are *free* computing that makes your life easier.

Here are 3 use cases for ML projects ↓
➡️ Continuous Integration and Deployment (CI/CD)

Machine Learning is software engineering. As such, it is crucial you automate:

→ code updates (aka integration), and
→ code releases to your production environment (aka deployment)
➡️ Batch feature pipelines

This is a program that runs on a chron-like schedule, that fetches raw data from a data source (e.g. a data warehouse), computes ML features, and saves them to a storage service (e.g. a feature store).

Feature pipelines are present in every ML system.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(