Pau Labarta Bajo Profile picture
Jul 25 โ€ข 6 tweets โ€ข 2 min read Twitter logo Read on Twitter
Advice for ML beginners๐Ÿ’ก

GitHub actions are *free* computing that makes your life easier.

Here are 3 use cases for ML projects โ†“
โžก๏ธ Continuous Integration and Deployment (CI/CD)

Machine Learning is software engineering. As such, it is crucial you automate:

โ†’ code updates (aka integration), and
โ†’ code releases to your production environment (aka deployment)
โžก๏ธ Batch feature pipelines

This is a program that runs on a chron-like schedule, that fetches raw data from a data source (e.g. a data warehouse), computes ML features, and saves them to a storage service (e.g. a feature store).

Feature pipelines are present in every ML system.
โžก๏ธ Inference pipelines

Batch scoring is one of the most popular ways to generate fresh predictions from an ML model.

They fetch recent features, and a model artifact, generate predictions, and save them in a storage layer.
Wanna become a real-world ML engineer?

Join the Serverless ML Community โ†“โ†“โ†“
serverless-ml.carrd.co
Wanna get more tweets like this?
โ†’ Follow me @paulabartabajo_

Wanna help me spread the word?
โ†’ Like/Retweet the first tweet below โ†“โ†“โ†“

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Jul 26
XGBoost is one of the most effective algorithms for time-series prediction.

But, you need to prepare your data carefully.

Here is a Python library to help you prepare your data โ†“
@joaopcnogueira, one of my students from the Real World ML Tutorial, has built ๐˜๐˜€๐Ÿฎ๐—บ๐—น a Python library that lets you transform

- a time series dataset, into
- a training dataset, with features and targets

Enjoy it

And give it. a star โญ on GitHub โ†“
github.com/joaopcnogueiraโ€ฆ
Wanna build your first real-world ML system?

Join the Real-World ML Tutorial + Community and get LIFETIME ACCESS to

โ†’ 3 hours of video lectures ๐ŸŽฌ
โ†’ Full source code ๐Ÿ‘จโ€๐Ÿ’ป
โ†’ Discord private community ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘ฆ

Use code "NINJA" at checkout for a 20% discount

realworldmachinelearning.carrd.co
Read 4 tweets
Jul 26
3 reasons why your XGBoost model does not work

And 3 ways to solve them
โ†“โ†“โ†“ Image
1๏ธโƒฃ You are overfitting the training data

This is common in highly non-stationary problems, like cryptocurrency price prediction.

Solution. Use cross-validation and hyper-parameter tuning, to adjust the model's bias-variance and get good out-of-sample metrics.
2๏ธโƒฃ You miss an essential feature in your dataset

You need more/better features to increase the signal-to-noise ratio in your data.

Solution. Pull more raw features and generate better ones through feature engineering.
Read 5 tweets
Jul 19
Wanna train more ML models for less money? ๐Ÿ’ธ

3 tips to optimize your ML budget ๐Ÿง โ†“
To build a Machine Learning product you need to spend money on 3 types of services:

โ†’ Computing, like CPUs and GPUs so you can train and deploy your models.
โ†’ Orchestration, to kick off the 3 pipelines of your system
โ†’ Storage, to save features, models, and experiment runs
And the thing is, not all these services cost you the same.

โ†’ Orchestration and storage are not expensive ๐Ÿ’ธ
โ†’ Computing, on the other hand, can get very expensive ๐Ÿ’ธ๐Ÿ’ธ๐Ÿ’ธ๐Ÿ’ธ๐Ÿ’ธ
Read 11 tweets
Jul 18
The most effective thing you can do to land an ML job is to

- pick a problem you care about
- build an ML solution, and
- release it to the public.

Here is an example to inspire you ๐Ÿค—โ†“
The most effective way to learn and showcase your ML skills is to build a ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ฒ ๐— ๐—Ÿ ๐—ฝ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜ and publish

โ†’ the source code on GitHub, and
โ†’ a public working app

Here is an example โ†“
๐—ฃ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐—ก๐—•๐—” ๐—ด๐—ฎ๐—บ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐Ÿ€

@curiovana has built a complete ML app, that

- fetches the list of upcoming NBA matches
- generates useful predictive features
- predicts these games' outcomes.

How did he do it?
Read 8 tweets
Jul 17
Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“
#1 Create your project folder and cd into it Image
#2 Create a README file.

This is the first thing anyone visiting your repository will see.
You better have one. And you better make it pretty. Image
Read 14 tweets
Jul 10
Looking for effective ways to learn MLOps?

Forget theory and get your hands on a real-world problem ๐Ÿง 

Here is a project you can build (for free) using Python ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“
Let's build an ML service to predict the price of Ethereum (ETH) in the next 1 hour, using Python ๐Ÿ and serverless tools.

You will learn a lot, AND you might even make some money ๐Ÿ’ฐ

These are the steps to build this system โ†“
Step 1: Feature generation script ๐Ÿ

1 โ†’ fetches raw data on actual trades ETH/USD from the Kraken API:

2 โ†’ engineers new features from the raw data (aka model inputs), and targets (aka model outputs)

3 โ†’ stores these features in the *Feature Store* https://t.co/5hwgydEGFjdocs.kraken.com/rest/
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(