Pau Labarta Bajo Profile picture
Aug 30 β€’ 15 tweets β€’ 4 min read
There is one skill every professional data scientist must have, that no online course talks about it πŸ€”

↓
Every aspiring data scientist I talk to thinks their job starts when someone else gives them

β†’ a dataset, and
β†’ a clearly defined metric to optimize for, e.g. accuracy

They are wrong.

Things are slightly more complex in the real world.
In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI).

The data scientist's job is to translate a business problem into the *right* data science problem.

Then solve it.
To translate a business problem into *the right* data science problem you do 2 things:

1 β†’ ask questions
2 β†’ explore the data to find clues.

There is nothing more frustrating than building a great data science solution, to the wrong business problem.
Imagine you are a data scientist πŸ§‘πŸ½β€πŸ”¬ at Uber.

And your product lead tells you:

πŸ‘©β€πŸ’Ό: "We want to decrease user churn by 5% this quarter"
There are different reasons why a user would stop using Uber.

For example:

β†’ "Lyft is offering better prices for that geo" (pricing problem)
β†’ "Car waiting times are too long" (supply problem)
β†’ "The Android version of the app is very slow" (client-app performance problem)
You build this list ↑ by asking the right questions to the rest of the team.

You need to understand the user's experience using the app, from HER point of view.
Typically there is no single reason behind churn, but a combination of a few of these.

The question is: which one should you focus on?

This is when you pull out your great data science skills and EXPLORE THE DATA πŸ”Ž
You explore the data to understand how plausible each of the above explanations is.

The output from this analysis is a single hypothesis you should consider further.

Depending on the hypothesis, you will solve the data science problem differently.

For example...
#Example 1: "Lyft is offering better prices for that geo" (pricing problem)

Solution: Detect the segment of users who are likely to churn (possibly using an ML Model) and send personalized discounts via push notifications.
#Example 2: "Car waiting times are too long" (supply problem)

Solution: Identify the location and time where supply is too low, and offer a price incentive for divers to cover these slots.
#Example 3: "The Android version of the app is very slow" (client-app performance problem)

Solution: Go to the frontend devs, show them the breakdown of use churn by app version, and convince them they should release a new version of the app with better performance.
In conclusion,

β†’ Translating business problems into *the right" data science problem is what separates a senior from a junior data scientist.

β†’ Ask the right questions, list possible solutions, and explore the data to narrow down the list to one.

β†’ Solve this one problem.
Wanna get more real-world ML content?

Subscribe to my newsletter and get for FREE my eBook

"How to become a freelance data scientist"

which has specific advice to help you become a freelance data scientist

↓↓↓
datamachines.xyz/subscribe/
That's all for today folks.

I hope you find this content useful for your path πŸ₯Ύβ›°οΈ

Wanna connect? ↓
Follow me @paulabartabajo_

Wanna help?
Like/Retweet the first tweet below to spread the wisdom
↓↓↓

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Sep 1
Every aspiring data scientist I talk to is overwhelmed by the colossal amount of online courses to choose from 🀯

My solution to this problem ↓
Learning is about connecting the dots.

However, it feels like there are too many dots to connect when learning data science.

Too many courses...
Too many blog posts...
Too many technologies...

Solution: You need to change the way you learn.
As a professional data scientist, you are expected to be a problem-solver for the company or institution you work for.

You need to be good at building data science products that solve business problems.

And for that, you don't need to be an expert in Python, for example.
Read 15 tweets
Aug 24
Wanna become a freelance data scientist? 😎

5 tips to help you become one ↓
#Tip 1: Start small

Clients donΒ΄t look for an all-in-one data scientist, but someone who can solve their SPECIFIC problems.

Identify the things you are already an expert in, e.g.

β†’ Dashboarding with Tableau, or
β†’ ML for computer vision, or
β†’ Scrapping

Apply only for these.
#Tip 2: Build a Minimum Viable Portfolio

Clients want to see real work you have done in the past. They want to see solid proof you can deliver.

Build a small public/private portfolio that focuses on your strengths (from #Tip 1 above).
Read 8 tweets
Aug 23
Most data scientists focus on algorithms.

So they fail.
Data science = code + DATA

You write code to process and understand the data.

However, if the data is bad, there is nothing that will help you.

Garbage in. Garbage out.
You can play with the code as much as you want.

But if the data is not

β†’ sufficient enough
β†’ complete enough
β†’ good enough

... you will fail.
Read 6 tweets
Aug 22
When I talk to aspiring data scientists, they face 1 of 3 challenges
#1 They do not know where to start.

There are tons of educational resources online, that cover all kinds of data science topics. And this can be overwhelming.
Solution: Pick a topic you are interested in, and search for a Github repo or Kaggle notebook that builds a solution around it.

e.g. "Sentiment analysis of Tweets"

Start by reading and copying someone else's work.

Copy from the best and start narrow. Then expand.
Read 8 tweets
Jul 27
Are you a data scientist trying to get a job? πŸ’Ό

What if I told you there is a better way than completing yet another online course? 🀯

Let me share with you the 4-step process I follow, to find new freelance projects in the Data Science world πŸ‘‡πŸΎ
Data science is a hot field. There is a huge demand for this role...
... and increased supply as well.

It is getting harder to land a data science job. The market is getting crowded, and competition is increasing.

The question is: how can I stand out from the crowd? πŸ€”
Most data scientists follow a passive approach to learning.

They read papers, blogs, and Twitter threads and collect online course certificates. They sometimes implement something.

If you wanna differentiate yourself you need to play this game differently.
Read 11 tweets
Jul 21
Are you a data scientist using CSV files to store your data?

What if I told you there is a better way?

Can you imagine a

-> lighter πŸ¦‹
-> faster 🏎️
-> cheaper πŸ’Έ

file format to save your datasets?

Read this thread so you don't need to imagine anymore πŸ‘‡πŸΎ
Do not get me wrong. I love CSVs.

You can open them with any text editor, inspect them and share them with others.

They have become the standard file format for datasets in the AI/ML community.

However, they have a little problem...
CSV files are stored as a list of rows (aka row-oriented), which causes 2 problems:

- they are slow to query --> SQL and CSV do not play well together.

- they are difficult to store efficiently --> CSV files take a lot of disk space.

Is there an alternative to CSVs?

Yes!
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(