The one skill every professional data scientist must have? 🤔

Don't look for it in online courses, it's not there.

Read the thread below and find out ↓↓↓
In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI):

And you, as a data scientist, need to

1 → Understand the business problem

2 → Transform the business problem into a data science problem

3 → Solve it Image
To transform a business problem into *the right* data science problem you

1 → ask questions and gather business context.
2 → explore the data to find clues.

There is nothing more frustrating than building a great data science solution... to the wrong business problem.
#Example: Let's say you work at Uber as a data scientist 🧑🏽‍🔬

And your product lead tells you:

👩‍💼: "We want to decrease user churn by 5% this quarter"

How do you go about solving this?...
There are different reasons why a user would stop using Uber.

For example:

1 → "Lyft offers better prices" 💸 → pricing problem

2 → "Car waiting times are too long" 🕑 → supply problem

3 → "The Android version of the app is very slow" 🐢 → App performance problem. Image
You build this list by asking the rest of the team the right questions.

You need to understand the user's experience using the app, from HER point of view.
There is often no single reason behind churn, but a combination of them.

The question is: which one should you focus on? 🤔

This is when you pull out your great data science skills and EXPLORE THE DATA 🔎
You explore the data to understand how plausible each of the above explanations is.

The output from this analysis is a single hypothesis you should consider further.

Depending on the hypothesis, you will solve the data science problem differently.

For example...
#Example 1: "Lyft offers better prices" 💸 (pricing problem)

Solution: Detect the segment of users who are likely to churn ( using an ML Model) and send personalized discounts via push notifications.

A/B test the discount to maximize impact without cannibalizing revenue.

Boom. Image
#Example 2: "Car waiting times are too long" 🕑 (supply problem)

Solution: Identify the location and time where supply is low, and offer a price incentive for divers to cover these slots.

A good dashboard does the job, and an A/B test finds the optimal incentive.

Boom. Image
#Example 3: "The Android version of the app is very slow" 🐢 (App performance problem)

Solution: Go to the frontend devs, and show them the breakdown of memory usage by app version... Image
... to convince them they should release a new version of the app with better performance.

Boom. Image
To sum up,

→ Translating a business problem into *the right" data science problem is (more than) half the job.

→ Ask the right questions, list possible solutions, and explore the data to narrow down the list to one.

→ Solve this one data science problem.
Every week I share real-world Data Science/Machine Learning content.

Follow me @paulabartabajo_ so you do not miss what's coming next.

Wanna help?
Like/Retweet the first tweet below to spread the wisdom
↓↓↓

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Nov 1
Tired of training lots of Machine Learning models, and not getting better results? 😵‍💫

This is how you solve this 🧠↓
A Machine Learning model is the output of a 3-step workflow where you:

1 → Fetch raw data, for example from an external database.

2 → Process the data into a tabular format, so you have N features and 1 target.

3 → Train ML models (e.g. XGBoost) and tune hyper-parameters.
If your ML model does not work, you have at least 1 of these 2 problems:

1 → The model is too simple to capture the patterns in the training data, and you need a more powerful model (step 3).

2 → The Training data has no patterns, so no model will work (steps 1 and 2).
Read 11 tweets
Oct 27
Overwhelmed by the massive amount of Data Science courses to choose from? 🤯

There is a better way to learn data science 🧠
And to land a job 💼

Here it is ↓
Stop taking courses.

No more passive reading.
No more "easy" paths, that do not stand you out from the crowd.

Instead, focus on BUILDING something you care about.
For example, if you are into Natural Language Processing, you can set yourself this goal:

"I want to build an AI tool to write comedy".

The world definitely needs more laughs 😄
Read 11 tweets
Oct 25
Wanna become an NLP engineer?

Stop taking online courses.
Build something instead 🏗️

Here is a project you can build 👩🏽‍💻👨‍💻↓↓↓
Reading blog posts about multi-billion-parameter Language Models is very cool.

However, building real-world NLP products from these models is where the real business value is. And this is what companies look for in the job market.

So, here is a PRO project you can build ↓
"An app that recommends what ML paper to read"

Imagine an app where you can describe what paper you are interested in reading today. For example

🧑‍🔬: "I want a paper about Transformers in Computer Vision"

Stop imagining. Instead, build this system ↓ in 4 steps
Read 10 tweets
Oct 18
Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps 👩🏽‍💻👨‍💻↓↓↓
#1 Create your project folder and cd into it Image
#2 Create a readme.md file.
This is the first thing anyone visiting your repository will see.
You better have one. And you better make it pretty. Image
Read 14 tweets
Oct 5
Love using Jupyter notebooks, but after a while, they look like a total mess? 😵‍💫

What if I told you there is a quick, simple, and efficient way to make them tidy and shiny?

These 3 tips will help you keep your notebooks clean and boost your productivity 🚀↓
Jupyter notebooks are the most popular environment to develop Machine Learning models.

They are the faster way to
→ add code
→ fix code
→ re-run code

for your Machine Learning project.

However, they quickly turn into a mess...

... unless you follow these 3 tips.
Tip #1. Encapsulate common code as functions.

If you do not encapsulate your code, you are doomed to duplicate it.

And code duplication is both a productivity killer and an endless source of bugs.

The solution:
→ Define functionality ONCE.
→ Call it as many times as you need
Read 11 tweets
Oct 4
"The best way to improve a Machine Learning model is to add more features to the input data."

You have read this sentence 100x times.

But is it true for real-world projects? 🤔 ↓↓↓
→ It is definitely true if you are trying to win a Kaggle competition. Adding more features can only help you in this context.

→ However, if you are working on a real-world ML project, adding features is no "free lunch" 🍜
One of the hardest problems in real-world ML projects is to prepare and send the input data the model needs to make predictions, once deployed.

aka "How to serve the input features the model needs to work well"?
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(