Pau Labarta Bajo Profile picture
May 5 β€’ 15 tweets β€’ 4 min read Twitter logo Read on Twitter
Every aspiring data scientist I talk to is overwhelmed by the colossal amount of online courses to choose from 🀯

My solution to this problem ↓
Learning is about connecting the dots.

However, it feels like there are too many dots to connect when learning data science.

Too many courses...
Too many blog posts...
Too many technologies...

Solution: You need to change the way you learn.
As a professional data scientist, you are expected to be a problem-solver for the company or institution you work for.

You need to be good at building data science products that solve business problems.

And for that, you don't need to be an expert in Python, for example.
Instead, you need to know *enough Python* to build your solution.

And to know how much Python is *enough* you must start your learning from the end goal.

Do not learn, and then start.
Start, and then learn.
Pick a project you are interested in.

For example, if you are into computer vision, you can set yourself this goal:

"I want to build a REST API that does face recognition".
Starting from the end goal puts your mind in the "problem-solving" mode.

So you start asking yourself the right questions.

And you start googling.
#question 1: "Is there a public dataset with human faces I can use to build my model".

And you happen to find a blog post with a list of relevant datasets of human faces. You investigate a bit further and find the one that seems the best for your use case.

Boom.
#question 2: "What ML model is good for face recognition?"

And you find 5 tutorials on Youtube that covers the model-building phase.

2 minutes later you pick the one that uses PyTorch, because you are more interested in PyTorch than Tensorflow.

Boom.
#question 3: "How can I build a REST API?"

And you discover Python Flask, an easy-to-use library that is designed just for that.

You find a quick-start tutorial that shares a basic skeleton for a REST API in Flask.

Boom.
#question 4: "How do I make my API accessible to anyone on the Internet?"

And you discover a few of the most popular ways to deploy ML APIs, from beginner-friendly options like Streamlit to more advanced solutions like AWS API Gateway.

So you pick Heroku, a good middle ground.
To build this project you will need *just enough Python*, to

β†’ train your ML model
β†’ build a REST API

And you don't need to become a Python expert to do these things.
When you learn by building projects, you get 2 things:

1 β†’ Become a problem-solver. And this is what separates a senior from a junior data scientist.

2 β†’ Build a portfolio. Every project you make is another valuable asset you will have when looking for jobs.
In conclusion,

β†’ You learn by solving a specific problem.

β†’ When you start from the end goal, you *think* and you learn just enough of each technology that lets you build your solution.

β†’ As a bonus, you build a portfolio that will greatly help you when looking for jobs.
Wanna learn to build production-grade ML products?
β†’ π—™π—Όπ—Ήπ—Ήπ—Όπ˜„ 𝗺𝗲 @paulabartabajo_
β†’ Join my email list: datamachines.xyz/subscribe/

Wanna help?
β†’ Like/Retweet the first tweet below to spread the wisdom ↓↓↓
Ready to take your ML career to the next level?

Learn with me how to build a complete ML app, in the Real-World ML Tutorial and Community. Because THIS IS what companies look for.

Use the discount code "NINJA" to get a 20% discount and LIFETIME access
realworldmachinelearning.carrd.co

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

May 4
Wanna become a freelance data scientist? 😎

5 tips to help you become one ↓
#Tip 1: Start small

Clients donΒ΄t look for an all-in-one data scientist, but someone who can solve their SPECIFIC problems.

Identify the things you are already an expert in, e.g.

β†’ Dashboarding with Tableau, or
β†’ ML for computer vision, or
β†’ Scrapping

Apply only for these.
#Tip 2: Build a Minimum Viable Portfolio

Clients want to see real work you have done in the past. They want to see solid proof you can deliver.

Build a small public/private portfolio that focuses on your strengths (from #Tip 1 above).
Read 8 tweets
May 3
How to turn an ML notebook into a real-world ML app?

(using only Python and MLOps)

πŸ§΅β†“β†“β†“
The starting point is this one Jupyter notebook where you:

1 - Loaded data from a CSV file
2 - Engineered features and targets
3 - Trained and validated an ML model.
4 - Generated predictions on the test set.

Let's turn this notebook into a batch-prediction service ↓
A batch-prediction service ingests raw data and outputs model predictions on a schedule (e.g. every 1 hour).

You can build one using this 3-pipeline architecture
- Feature pipeline πŸ“˜
- Training pipeline πŸ“™
- Batch inference pipeline πŸ“’

Let's go step by step... Image
Read 12 tweets
May 1
Job postings for entry-level data scientists are nonsense πŸ™ƒ

Don't try to fit all their requirements.

This is what you need to do instead ↓↓↓
Do not try to tick all the boxes in these long job postings.

Because you will go crazy.

And because it is a lie you need to rock at Python, SQL, ETL design, data visualization, Deep Learning, and Methapyisics to land an entry-level job in data science.
If so, why are companies asking all these things?

Well, because most of them do not have a clue about data science, so they Copy+Paste the job descriptions they see in top tech companies.

Fear of missing out (FOMO) pushes normal companies to ask for things they do not even need
Read 11 tweets
Apr 18
Here are 2 steps that every real-world ML problem has...

... that you won't learn in Kaggle ↓↓↓
➑️ From business problem to ML problem

Every Kaggle competition starts with a clearly defined target metric you need to optimize for.

But, in real-world ML, there is no target metric waiting for you.
It is your job to translate a business problem into an ML problem, by finding the right proxy metric.

This proxy metric is a quantitative and abstract metric, that positively correlates with the actual business metric you want to impact, e.g. accuracy, precision...
Read 6 tweets
Apr 18
All ML systems can be decomposed into 3 pipelines (aka programs):

β†’ Feature pipeline
β†’ Training pipeline
β†’ Inference pipeline

And this is how they work ↓
The feature pipeline takes raw data, from

- a data warehouse
- an external API, or
- a website, through scrapping

and generate features, aka the inputs for your ML model, and stores them in a Feature Store so that the other 2 pipelines can later use these features.
The training pipeline takes the features from the store and outputs a trained ML model.

These are (in general) the best models for each domain:

-Tabular data β†’ XGBoost
- Computer Vision β†’ Fine-tune a Convolutional Neural Net
- NLP β†’ Fine-tune a Transformer net.
Read 6 tweets
Apr 17
Building one professional real-world ML project is the best way to stand out from the crowd, and land an ML job.

And here is how you can do it, π˜€π˜π—²π—½-π—―π˜†-π˜€π˜π—²π—½ πŸ‘©β€πŸ’»πŸ‘¨πŸ½β€πŸ’»β†“
Step 1. Find a real-world problem you are interested in

Working on projects is harder than completing online courses.
But hey, no pain no gain.

It is VERY important you work on a problem you are interested in.
Otherwise, you will quit.
Step 2. Find a data source

Preferably a live API. If not possible, pick a static dataset from Kaggle.

Here is a superb repo with a list of public APIs you can use
github.com/public-apis/pu…
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(