Pau Labarta Bajo Profile picture
Dec 1 โ€ข 14 tweets โ€ข 3 min read
Wanna become a professional data scientist? ๐Ÿ‘ฉ๐Ÿพโ€๐Ÿ”ฌ๐Ÿ‘จโ€๐Ÿ”ฌ

One that feels
- knowledgeable ๐Ÿง 
- confident ๐Ÿ˜Ž
- and ready to charge well what she knows? ๐Ÿ’ฐ

Here is what you should do (spoiler alert, it is hard, but worth it) โ†“โ†“โ†“
The internet is flooded with Data Science/ML content:

โ†’ blog posts
โ†’ newsletters
โ†’ Twitter threads
โ†’ Arxiv papers
โ†’ ...

And the thing is, reading all that is not gonna get you a job.

You need to get your hands dirty โ›๏ธ๐Ÿ‘ท๐Ÿพโ€โ™‚๏ธ๐Ÿ‘ท๐Ÿปโ€โ™€๏ธ
Real learning in data science (like in life) happens when you

โ†’ face a specific problem
โ†’ struggle to solve it, and
โ†’ eventually solve it.

I call this the "problem-struggle-solution cycle".

This is how you learn everything in life.
And data science is not an exception.
When you skip this cycle you feel impostor syndrome.

That is, you feel deep inside that you are not ready to become a

โ†’ professional
โ†’ full-time
โ†’ and well paid

data scientist.

Impostor syndrome implies negative self-talk. And no amount of reading can cure you of this.
Is there another way to learn? ๐Ÿค”

Yes, there is.

I call it "Pick-a-real-world-problem-you-care-about-build-a-data-science-solution-and-explain-it-to-others".

Also known as "Project-based learning"
For #example: let's say you wanna become an NLP engineer.

Stop reading blog posts about multi-billion-parameter Language Models.

Instead, use NLP techniques (embeddings and similarities) to build this "app that recommends what ML paper to read"โ†“ Image
Starting from the end goal puts your mind in the "problem-solving" mode.

So you start asking yourself the right questions...
Question โ†’ is there any public dataset I can use?

And you find this one
kaggle.com/datasets/spsayโ€ฆ
Question โ†’ How do I transform the raw text into numerical features?

And you discover different "embedding" methods:

1 โ†’ TF-IDF vectorizer from scikit-learn (classic ๐Ÿ›๏ธ)
2 โ†’ GloVe word embeddings (modern)
3 โ†’ Sentence Transformers using HuggingFace's library (SOTA ๐Ÿ˜Ž)
Question โ†’ How do I match the user query with the paper to recommend?

And you discover the cosine-similarity distance, something you probably learned in high school and suddenly re-discover ๐Ÿ˜‰
Question โ†’ How do I make my app public?

And you finally build your first app with Streamlit.

Boom.
When you learn by building projects, you get 2 things:

1 โ†’ Become a problem-solver and cure your impostor syndrome.

2 โ†’ Build a portfolio. Every project you make is another valuable asset you will have when looking for jobs.
Building a real-world project is hard, but the effort is worth it.

This is why I am preparing a hands-on tutorial to help you build your OWN Machine Learning project.

Join my e-mail list to be notified when the tutorial series is out โ†“
datamachines.xyz/subscribe/
Every week I share real-world Data Science/Machine Learning content.

Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
Like/Retweet the first tweet below to spread the wisdom โ†“โ†“โ†“

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Nov 29
Training an ML model inside a Jupyter notebook is something every data scientist knows ๐Ÿ‹๏ธ

But do you know how to create a real-world ML service that makes a difference for the company you work for? ๐Ÿ“ˆ

If the answer is NO, this thread is for you ๐Ÿค—๐Ÿงตโ†“
So, what is the difference between model training and ML service? ๐Ÿค”

An ML service is a sequence of processing and storage steps that takes in raw data and outputs predictions that are used by the business to make smarter decisions.

Model training is just one of those steps.
And while the model you've trained in Jupyter notebook IS important, you need to build the rest of the system to make it work.

How do you do that?

2 solutions โ†“
Read 9 tweets
Nov 24
Machine Learning students try more complex ML models when they wanna improve their results.

So they miss the elephant in the room ๐Ÿ˜ โ†“โ†“โ†“
A Machine Learning model is like a cake, with 2 main ingredients:

โ†’ a dataset ๐Ÿ“Š
โ†’ an ML algorithm, for example, XGBoost ๐Ÿค–

And the thing is, no matter what algorithm you choose, the resulting ML model can only be as good as the dataset you use to train it.
The problem is that in online courses, and ML competitions, you work with a fixed dataset that someone has generated for you.

In real-world projects, there is no dataset waiting for you. Instead, you need to create it.

And this is the most critical step in the whole project.
Read 11 tweets
Nov 23
Wanna learn time-series forecasting? ๐Ÿ“ˆ

No more reading blog posts.
It is time to forecast for real ๐Ÿ˜Ž

Here is a project you can build ๐Ÿ‘ฉโ€๐Ÿ’ป๐Ÿง‘๐Ÿฝโ€๐Ÿ’ปโ†“
Business problem ๐Ÿ’ผ

Let's create a predictive model to forecast the number of taxi rides that will happen in Manhattan (New York City)

- per hour (e.g. tomorrow between 5 PM and 6 PM), and
- per zone (e.g. Zone 113 "Lower Manhattan)

in the following 3 days.
This model can help the operations team of the NYC taxi Commission optimize the distribution of the taxi fleet, in real-time, and maximize revenue.

Here are the steps to build this project โ†“
Read 9 tweets
Nov 21
Wanna land a data science job?

Build ONE project that solves ONE real-world problem ๐Ÿš€

3 steps to find the right problem for you ๐Ÿค—โ†“
Step #1. Make a list of N (1 < N < 5) problems that GENUINELY interest you โค๏ธ

Examples:
1. Air quality prediction
2. Earthquake prediction
3. Cryptocurrency prediction

Building a project is hard. You will go through ups and downs.

Better pick something that interests you.
Step #2. Check data availability ๐Ÿ“Š

Without a solid dataset, there is not much Machine Learning you can do.

Hence, start with a pool of ideas but then pick the one for which you can find a GOOD dataset, for example, in Kaggle.
kaggle.com/datasets
Read 6 tweets
Nov 15
Wanna learn MLOps?

Stop reading blog posts.
Build a prediction service instead ๐Ÿš€

Here is a project you can build (for free) ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป๐Ÿ‘จโ€๐Ÿ’ปโ†“โ†“โ†“
Let's build a Machine Learning service to predict the Air Quality Index (AQI) in your city in the next 3 days, using a 100% serverless stack.

You will learn a lot, AND you will build something useful for society.

Win-win ๐Ÿ†๐Ÿ†

These are steps to build this โ†“
Step 1: Feature generation script ๐Ÿ

1 โ†’ fetches raw weather and pollutant data from an external API like aqicn.org

2 โ†’ computes features from this raw data (aka model inputs), and targets (aka model outputs)

3 โ†’ stores these features in the *Feature Store*
Read 10 tweets
Nov 8
Junior data scientists are told the problems they need to solve ๐Ÿ‘ถ

Senior data scientists find the problems that need to be solved ๐Ÿ˜Ž

Here is how they do it โ†“โ†“โ†“
#Example: Imagine today is your first day at Spotify.

You are now part of the data science team responsible to maximize paid subscriptions and drive revenue.

The key question is:

"How can you increase subscription revenues with data science?" ๐Ÿค”
#1 Junior data scientist's idea ๐Ÿ’ก

Spotify is famous for its smart recommendations, and you completed a course on recommender systems with advanced Deep Learning techniques.

So you think:

"Let's build a better recommendation engine and increase subscriptions"

Wait a second...
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(