Tweet

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Pau Labarta Bajo

@paulabartabajo_

Oct 23 • 10 tweets • 3 min read Twitter logo

Read on Twitter

Love using Jupyter notebooks, but after a while, they look like a total mess? 😵‍💫

What if I told you there is a quick, simple, and efficient way to make them tidy and shiny?

These 3 tips will help you keep your notebooks clean and boost your productivity 🚀↓

Jupyter notebooks are the most popular environment to develop Machine Learning models.

They are the faster way to
→ add code
→ fix code
→ re-run code

for your Machine Learning project.

However, they quickly turn into a mess...

... unless you follow these 3 tips.

Tip #1. Encapsulate common code as functions.

If you do not encapsulate your code, you are doomed to duplicate it.

And code duplication is both a productivity killer and an endless source of bugs.

The solution:
→ Define functionality ONCE.
→ Call it as many times as you need

Tip #2. Extract common functions into a separate src/ folder

Often you have the same function defined in several notebooks. Which is, again, code duplication.

To solve this create a source code folder (aka src/) at the same level where your notebooks are...

...and extract your functions as code in .py files.

You can group these functions into separate .py files, depending on their main functionality:

→ Plotting
→ Data transformation
→ Model training
→ Utils

Your project structure will look like this ↓

Tip #3: Add `autoreload` magic to your Jupyter notebook.

To use your functions inside Jupyter, you need to import them.

For example:

By default, Jupyter caches all library imports, and only loads them once, unless you restart the kernel.

So when you update the .py files in src/, the changes are not picked up by Jupyter.

To solve this, you just add these 2 lines at the beginning of your notebook.

To sum up:

→ Reduce code duplication, by encapsulating and extracting common functionality into separate functions under src/.

→ Add autorelaod magic to your notebook, to keep it in sync with your src/ code.

And Voila!

https://twitter.com/1408789941040058369/status/1716515138927259926

Wanna get more tweets like this?
→ Follow me @paulabartabajo_

Wanna help me spread the word?
→ Like/Retweet the first tweet below ↓↓↓

https://twitter.com/1408789941040058369/status/1716515138927259926

Join 2k members to the 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗠𝗟 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 🤗

A Discord community of ML builders 👩‍💻👨🏽‍💻 focused on building 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝗠𝗟 𝗮𝗽𝗽𝘀
↓↓↓
serverless-ml.carrd.co

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @paulabartabajo_

Pau Labarta Bajo

@paulabartabajo_

Oct 21

The most effective thing you can do to land an ML job is to

- pick a problem you care about
- build an ML solution, and
- release it to the public.

Here is an example to inspire you 🤗↓

The most effective way to learn and showcase your ML skills is to build a 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗠𝗟 𝗽𝗿𝗼𝗷𝗲𝗰𝘁 and publish

→ the source code on GitHub, and
→ a public working app

Here is an example ↓

𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗻𝗴 𝗡𝗕𝗔 𝗴𝗮𝗺𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 𝘄𝗶𝘁𝗵 𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 🏀

@curiovana has built a complete ML app, that

- fetches the list of upcoming NBA matches
- generates useful predictive features
- predicts these games' outcomes.

How did he do it?

Read 8 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 18

XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

These are the steps to transform raw data into supervised ML data for time-series prediction ↓

Example

Imagine you work at a ride-sharing app company in NYC as an ML engineer.

You want to help the operations team allocate the fleet of drivers optimally each hour of the day.

The end goal is to maximize revenue

ML problem

Let's build a predictive model for taxi demand. The model will predict how many rides will be requested

on each area of NYC
in the following 60 minutes

However, before you train any ML model, you need to prepare your data.

These are the steps

Read 9 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 16

3 years ago I struggled to build ML products.

Then I discovered this ↓

Unless you are a researcher in academia, and your goal is to publish a paper, you cannot just focus on the ML model you wanna train.

You need to think further down the line and think of the business problem you are trying to solve.

This is the "product-first" mindset.

Product-first mindset 🧠

There are 2 essential skills you need to perfect and master over time:

1 - Problem framing
2 - Model operationalization

Read 8 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 12

Looking for real-world ML project ideas? 💡

Let's build an ML system that predicts loan defaults.

With 𝗳𝘂𝗹𝗹 𝘀𝗼𝘂𝗿𝗰𝗲 𝗰𝗼𝗱𝗲👷🏾‍♀️👷↓

Kaggle is full of top-notch tutorials on how to train an ML model.

When it comes to loan default prediction, this is probably the best

kaggle.com/code/faressaya…

→ The question is: "How do you transform this all-in-one Jupyter notebook into a fully working ML app?"

→ The solution: Let's build a batch-scoring system using the 𝟯-𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲

Read 8 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 11

Advice for ML beginners💡

GitHub actions are *free* computing that makes your life easier.

Here are 3 use cases for ML projects ↓

➡️ Continuous Integration and Deployment (CI/CD)

Machine Learning is software engineering. As such, it is crucial you automate:

→ code updates (aka integration), and
→ code releases to your production environment (aka deployment)

➡️ Batch feature pipelines

This is a program that runs on a chron-like schedule, that fetches raw data from a data source (e.g. a data warehouse), computes ML features, and saves them to a storage service (e.g. a feature store).

Feature pipelines are present in every ML system.

Read 6 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 9

In online courses, you are *given* the training data, and your job is to fit the best model.

In real-world Machine Learning, no one gives you the training data. You need to generate it yourself 👷🏽‍♀️👷🏽‍♂️

And that's the toughest part.

3 tips to generate great training data 😎↓

Once you have the training data, your job is to fit the best model.

So you
→ Engineer new features
→ Fit model 1, model 2...
→ Fine-tune hyper-parameters.
→ Build ensembles.

... and squeeze as much juice as you can from the data.

When no one gives you the training data, you do it yourself.

And while it can be fun, it's the toughest part as well.

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Pau Labarta Bajo

Try unrolling a thread yourself!

More from @paulabartabajo_

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!