Pau Labarta Bajo Profile picture
Oct 23 โ€ข 10 tweets โ€ข 3 min read Twitter logo Read on Twitter
Love using Jupyter notebooks, but after a while, they look like a total mess? ๐Ÿ˜ตโ€๐Ÿ’ซ

What if I told you there is a quick, simple, and efficient way to make them tidy and shiny?

These 3 tips will help you keep your notebooks clean and boost your productivity ๐Ÿš€โ†“
Jupyter notebooks are the most popular environment to develop Machine Learning models.

They are the faster way to
โ†’ add code
โ†’ fix code
โ†’ re-run code

for your Machine Learning project.

However, they quickly turn into a mess...

... unless you follow these 3 tips.
Tip #1. Encapsulate common code as functions.

If you do not encapsulate your code, you are doomed to duplicate it.

And code duplication is both a productivity killer and an endless source of bugs.

The solution:
โ†’ Define functionality ONCE.
โ†’ Call it as many times as you need
Tip #2. Extract common functions into a separate src/ folder

Often you have the same function defined in several notebooks. Which is, again, code duplication.

To solve this create a source code folder (aka src/) at the same level where your notebooks are...
...and extract your functions as code in .py files.

You can group these functions into separate .py files, depending on their main functionality:

โ†’ Plotting
โ†’ Data transformation
โ†’ Model training
โ†’ Utils

Your project structure will look like this โ†“ Image
Tip #3: Add `autoreload` magic to your Jupyter notebook.

To use your functions inside Jupyter, you need to import them.

For example: Image
By default, Jupyter caches all library imports, and only loads them once, unless you restart the kernel.

So when you update the .py files in src/, the changes are not picked up by Jupyter.

To solve this, you just add these 2 lines at the beginning of your notebook. Image
To sum up:

โ†’ Reduce code duplication, by encapsulating and extracting common functionality into separate functions under src/.

โ†’ Add autorelaod magic to your notebook, to keep it in sync with your src/ code.

And Voila!
Wanna get more tweets like this?
โ†’ Follow me @paulabartabajo_

Wanna help me spread the word?
โ†’ Like/Retweet the first tweet below โ†“โ†“โ†“
Join 2k members to the ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฒ๐˜€๐˜€ ๐— ๐—Ÿ ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐˜† ๐Ÿค—

A Discord community of ML builders ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ป focused on building ๐—ฟ๐—ฒ๐—ฎ๐—น-๐˜„๐—ผ๐—ฟ๐—น๐—ฑ ๐— ๐—Ÿ ๐—ฎ๐—ฝ๐—ฝ๐˜€
โ†“โ†“โ†“
serverless-ml.carrd.co

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Oct 21
The most effective thing you can do to land an ML job is to

- pick a problem you care about
- build an ML solution, and
- release it to the public.

Here is an example to inspire you ๐Ÿค—โ†“
The most effective way to learn and showcase your ML skills is to build a ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ฒ ๐— ๐—Ÿ ๐—ฝ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜ and publish

โ†’ the source code on GitHub, and
โ†’ a public working app

Here is an example โ†“
๐—ฃ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐—ก๐—•๐—” ๐—ด๐—ฎ๐—บ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐Ÿ€

@curiovana has built a complete ML app, that

- fetches the list of upcoming NBA matches
- generates useful predictive features
- predicts these games' outcomes.

How did he do it?
Read 8 tweets
Oct 18
XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

These are the steps to transform raw data into supervised ML data for time-series prediction โ†“
Example

Imagine you work at a ride-sharing app company in NYC as an ML engineer.

You want to help the operations team allocate the fleet of drivers optimally each hour of the day.

The end goal is to maximize revenue
ML problem

Let's build a predictive model for taxi demand. The model will predict how many rides will be requested

on each area of NYC
in the following 60 minutes

However, before you train any ML model, you need to prepare your data.

These are the steps
Read 9 tweets
Oct 16
3 years ago I struggled to build ML products.

Then I discovered this โ†“ Image
Unless you are a researcher in academia, and your goal is to publish a paper, you cannot just focus on the ML model you wanna train.

You need to think further down the line and think of the business problem you are trying to solve.

This is the "product-first" mindset.
Product-first mindset ๐Ÿง 

There are 2 essential skills you need to perfect and master over time:

1 - Problem framing
2 - Model operationalization
Read 8 tweets
Oct 12
Looking for real-world ML project ideas? ๐Ÿ’ก

Let's build an ML system that predicts loan defaults.

With ๐—ณ๐˜‚๐—น๐—น ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐Ÿ‘ท๐Ÿพโ€โ™€๏ธ๐Ÿ‘ทโ†“
Kaggle is full of top-notch tutorials on how to train an ML model.

When it comes to loan default prediction, this is probably the best

kaggle.com/code/faressayaโ€ฆ
โ†’ The question is: "How do you transform this all-in-one Jupyter notebook into a fully working ML app?"

โ†’ The solution: Let's build a batch-scoring system using the ๐Ÿฏ-๐—ฝ๐—ถ๐—ฝ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ
Read 8 tweets
Oct 11
Advice for ML beginners๐Ÿ’ก

GitHub actions are *free* computing that makes your life easier.

Here are 3 use cases for ML projects โ†“
โžก๏ธ Continuous Integration and Deployment (CI/CD)

Machine Learning is software engineering. As such, it is crucial you automate:

โ†’ code updates (aka integration), and
โ†’ code releases to your production environment (aka deployment)
โžก๏ธ Batch feature pipelines

This is a program that runs on a chron-like schedule, that fetches raw data from a data source (e.g. a data warehouse), computes ML features, and saves them to a storage service (e.g. a feature store).

Feature pipelines are present in every ML system.
Read 6 tweets
Oct 9
In online courses, you are *given* the training data, and your job is to fit the best model.

In real-world Machine Learning, no one gives you the training data. You need to generate it yourself ๐Ÿ‘ท๐Ÿฝโ€โ™€๏ธ๐Ÿ‘ท๐Ÿฝโ€โ™‚๏ธ

And that's the toughest part.

3 tips to generate great training data ๐Ÿ˜Žโ†“
Once you have the training data, your job is to fit the best model.

So you
โ†’ Engineer new features
โ†’ Fit model 1, model 2...
โ†’ Fine-tune hyper-parameters.
โ†’ Build ensembles.

... and squeeze as much juice as you can from the data. Image
When no one gives you the training data, you do it yourself.

And while it can be fun, it's the toughest part as well. Image
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(