Tweet

Pau Labarta Bajo

Sep 1 • 15 tweets • 4 min read

Every aspiring data scientist I talk to is overwhelmed by the colossal amount of online courses to choose from 🤯

My solution to this problem ↓

Learning is about connecting the dots.

However, it feels like there are too many dots to connect when learning data science.

Too many courses...
Too many blog posts...
Too many technologies...

Solution: You need to change the way you learn.

As a professional data scientist, you are expected to be a problem-solver for the company or institution you work for.

You need to be good at building data science products that solve business problems.

And for that, you don't need to be an expert in Python, for example.

Instead, you need to know *enough Python* to build your solution.

And to know how much Python is *enough* you must start your learning from the end goal.

Do not learn, and then start.
Start, and then learn.

Pick a project you are interested in.

For example, if you are into computer vision, you can set yourself this goal:

"I want to build a REST API that does face recognition".

Starting from the end goal puts your mind in the "problem-solving" mode.

So you start asking yourself the right questions.

And you start googling.

#question 1: "Is there a public dataset with human faces I can use to build my model".

And you happen to find a blog post with a list of relevant datasets of human faces. You investigate a bit further and find the one that seems the best for your use case.

Boom.

#question 2: "What ML model is good for face recognition?"

And you find 5 tutorials on Youtube that covers the model-building phase.

2 minutes later you pick the one that uses PyTorch, because you are more interested in PyTorch than Tensorflow.

Boom.

#question 3: "How can I build a REST API?"

And you discover Python Flask, an easy-to-use library that is designed just for that.

You find a quick-start tutorial that shares a basic skeleton for a REST API in Flask.

Boom.

#question 4: "How do I make my API accessible to anyone on the Internet?"

And you discover a few of the most popular ways to deploy ML APIs, from beginner-friendly options like Streamlit to more advanced solutions like AWS API Gateway.

So you pick Heroku, a good middle ground.

To build this project you will need *just enough Python*, to

→ train your ML model
→ build a REST API

And you don't need to become a Python expert to do these things.

When you learn by building projects, you get 2 things:

1 → Become a problem-solver. And this is what separates a senior from a junior data scientist.

2 → Build a portfolio. Every project you make is another valuable asset you will have when looking for jobs.

In conclusion,

→ You learn by solving a specific problem.

→ When you start from the end goal, you *think* and you learn just enough of each technology that lets you build your solution.

→ As a bonus, you build a portfolio that will greatly help you when looking for jobs.

Wanna get more real-world ML content?

Subscribe to my newsletter and get for FREE my eBook

"How to become a freelance data scientist"

which has specific advice to help you become a freelance data scientist

↓↓↓
datamachines.xyz/subscribe/

@paulabartabajo_

That's all for today folks.

I hope you find this content useful for your path 🥾⛰️

Wanna connect? ↓
Follow me @paulabartabajo_

Wanna help?
Like/Retweet the first tweet below to spread the wisdom
↓↓↓

https://twitter.com/paulabartabajo_/status/1565316061523677185

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @paulabartabajo_

Pau Labarta Bajo

@paulabartabajo_

Aug 30

There is one skill every professional data scientist must have, that no online course talks about it 🤔

↓

Every aspiring data scientist I talk to thinks their job starts when someone else gives them

→ a dataset, and
→ a clearly defined metric to optimize for, e.g. accuracy

They are wrong.

Things are slightly more complex in the real world.

In the real world, data science projects start from a business problem.

They are born to move a key business metric (KPI).

The data scientist's job is to translate a business problem into the *right* data science problem.

Then solve it.

Read 15 tweets

Pau Labarta Bajo

@paulabartabajo_

Aug 24

Wanna become a freelance data scientist? 😎

5 tips to help you become one ↓

#Tip 1: Start small

Clients don´t look for an all-in-one data scientist, but someone who can solve their SPECIFIC problems.

Identify the things you are already an expert in, e.g.

→ Dashboarding with Tableau, or
→ ML for computer vision, or
→ Scrapping

Apply only for these.

#Tip 2: Build a Minimum Viable Portfolio

Clients want to see real work you have done in the past. They want to see solid proof you can deliver.

Build a small public/private portfolio that focuses on your strengths (from #Tip 1 above).

Read 8 tweets

Pau Labarta Bajo

@paulabartabajo_

Aug 23

Most data scientists focus on algorithms.

So they fail.

Data science = code + DATA

You write code to process and understand the data.

However, if the data is bad, there is nothing that will help you.

Garbage in. Garbage out.

You can play with the code as much as you want.

But if the data is not

→ sufficient enough
→ complete enough
→ good enough

... you will fail.

Read 6 tweets

Pau Labarta Bajo

@paulabartabajo_

Aug 22

When I talk to aspiring data scientists, they face 1 of 3 challenges

#1 They do not know where to start.

There are tons of educational resources online, that cover all kinds of data science topics. And this can be overwhelming.

Solution: Pick a topic you are interested in, and search for a Github repo or Kaggle notebook that builds a solution around it.

e.g. "Sentiment analysis of Tweets"

Start by reading and copying someone else's work.

Copy from the best and start narrow. Then expand.

Read 8 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 27

Are you a data scientist trying to get a job? 💼

What if I told you there is a better way than completing yet another online course? 🤯

Let me share with you the 4-step process I follow, to find new freelance projects in the Data Science world 👇🏾

Data science is a hot field. There is a huge demand for this role...
... and increased supply as well.

It is getting harder to land a data science job. The market is getting crowded, and competition is increasing.

The question is: how can I stand out from the crowd? 🤔

Most data scientists follow a passive approach to learning.

They read papers, blogs, and Twitter threads and collect online course certificates. They sometimes implement something.

If you wanna differentiate yourself you need to play this game differently.

Read 11 tweets

Pau Labarta Bajo

@paulabartabajo_

Jul 21

Are you a data scientist using CSV files to store your data?

What if I told you there is a better way?

Can you imagine a

-> lighter 🦋
-> faster 🏎️
-> cheaper 💸

file format to save your datasets?

Read this thread so you don't need to imagine anymore 👇🏾

Do not get me wrong. I love CSVs.

You can open them with any text editor, inspect them and share them with others.

They have become the standard file format for datasets in the AI/ML community.

However, they have a little problem...

CSV files are stored as a list of rows (aka row-oriented), which causes 2 problems:

- they are slow to query --> SQL and CSV do not play well together.

- they are difficult to store efficiently --> CSV files take a lot of disk space.

Is there an alternative to CSVs?

Yes!

Read 17 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Pau Labarta Bajo

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @paulabartabajo_

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Pau Labarta Bajo

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?