Tweet

Pau Labarta Bajo

Nov 8 • 11 tweets • 3 min read

Junior data scientists are told the problems they need to solve 👶

Senior data scientists find the problems that need to be solved 😎

Here is how they do it ↓↓↓

#Example: Imagine today is your first day at Spotify.

You are now part of the data science team responsible to maximize paid subscriptions and drive revenue.

The key question is:

"How can you increase subscription revenues with data science?" 🤔

#1 Junior data scientist's idea 💡

Spotify is famous for its smart recommendations, and you completed a course on recommender systems with advanced Deep Learning techniques.

So you think:

"Let's build a better recommendation engine and increase subscriptions"

Wait a second...

... before you start typing any code:

→ Are you sure this is the best problem to focus on?
Many great data scientists have worked on this problem in the company before.

→ What are the chances you build a better model?
Honestly, pretty slim.

You will fail because you focused too much on the solution (recommender systems) and not the actual problem (subscription revenue).

You have great technical knowledge, but you chose the wrong problem.

This is what a senior data scientist would do instead ↓

#2: Senior data scientist's idea 💡💡💡

Let's take a user-centric approach, and ask the question:

"Why does a user decide to start a paid subscription?"

Two are the factors behind her decision:
→ Quality: How good is the app? 👌
→ Price: How much does it cost? 💸

Improving the quality of the app is doable, but HARD.

Instead, experimenting with the optimal pricing point is much easier.

So, what about identifying a segment with historically low conversion, and running an A/B test with different price discounts?

It is a solution with low technical complexity and higher chances of succeeding.

Which means a better problem to work on.

And hence, higher chances of success 🧠

In conclusion,

→ Focus on finding the right problem to solve. This is the most critical part of the whole project (aka pick your battles wisely).

→ First the WHAT (to solve), then the HOW (to solve it). Not the other way around.

Wanna become a freelance data scientist?

Join my e-mail list and get my eBook "How to become a freelance data scientist", for FREE ↓
freelance-data-science.carrd.co

@paulabartabajo_

Every week I share real-world Data Science/Machine Learning content.

Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
Like/Retweet the first tweet below to spread the wisdom ↓↓↓

https://twitter.com/paulabartabajo_/status/1589981092844445696

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @paulabartabajo_

Pau Labarta Bajo

@paulabartabajo_

Nov 3

The one skill every professional data scientist must have? 🤔

Don't look for it in online courses, it's not there.

Read the thread below and find out ↓↓↓

In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI):

And you, as a data scientist, need to

1 → Understand the business problem

2 → Transform the business problem into a data science problem

3 → Solve it

To transform a business problem into *the right* data science problem you

1 → ask questions and gather business context.
2 → explore the data to find clues.

There is nothing more frustrating than building a great data science solution... to the wrong business problem.

Read 14 tweets

Pau Labarta Bajo

@paulabartabajo_

Nov 1

Tired of training lots of Machine Learning models, and not getting better results? 😵‍💫

This is how you solve this 🧠↓

A Machine Learning model is the output of a 3-step workflow where you:

1 → Fetch raw data, for example from an external database.

2 → Process the data into a tabular format, so you have N features and 1 target.

3 → Train ML models (e.g. XGBoost) and tune hyper-parameters.

If your ML model does not work, you have at least 1 of these 2 problems:

1 → The model is too simple to capture the patterns in the training data, and you need a more powerful model (step 3).

2 → The Training data has no patterns, so no model will work (steps 1 and 2).

Read 11 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 27

Overwhelmed by the massive amount of Data Science courses to choose from? 🤯

There is a better way to learn data science 🧠
And to land a job 💼

Here it is ↓

Stop taking courses.

No more passive reading.
No more "easy" paths, that do not stand you out from the crowd.

Instead, focus on BUILDING something you care about.

For example, if you are into Natural Language Processing, you can set yourself this goal:

"I want to build an AI tool to write comedy".

The world definitely needs more laughs 😄

Read 11 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 25

Wanna become an NLP engineer?

Stop taking online courses.
Build something instead 🏗️

Here is a project you can build 👩🏽‍💻👨‍💻↓↓↓

Reading blog posts about multi-billion-parameter Language Models is very cool.

However, building real-world NLP products from these models is where the real business value is. And this is what companies look for in the job market.

So, here is a PRO project you can build ↓

"An app that recommends what ML paper to read"

Imagine an app where you can describe what paper you are interested in reading today. For example

🧑‍🔬: "I want a paper about Transformers in Computer Vision"

Stop imagining. Instead, build this system ↓ in 4 steps

Read 10 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 18

Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps 👩🏽‍💻👨‍💻↓↓↓

#1 Create your project folder and cd into it

#2 Create a readme.md file.
This is the first thing anyone visiting your repository will see.
You better have one. And you better make it pretty.

Read 14 tweets

Pau Labarta Bajo

@paulabartabajo_

Oct 5

Love using Jupyter notebooks, but after a while, they look like a total mess? 😵‍💫

What if I told you there is a quick, simple, and efficient way to make them tidy and shiny?

These 3 tips will help you keep your notebooks clean and boost your productivity 🚀↓

Jupyter notebooks are the most popular environment to develop Machine Learning models.

They are the faster way to
→ add code
→ fix code
→ re-run code

for your Machine Learning project.

However, they quickly turn into a mess...

... unless you follow these 3 tips.

Tip #1. Encapsulate common code as functions.

If you do not encapsulate your code, you are doomed to duplicate it.

And code duplication is both a productivity killer and an endless source of bugs.

The solution:
→ Define functionality ONCE.
→ Call it as many times as you need

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Pau Labarta Bajo

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @paulabartabajo_

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!