Pau Labarta Bajo Profile picture
Oct 27 β€’ 11 tweets β€’ 5 min read
Overwhelmed by the massive amount of Data Science courses to choose from? 🀯

There is a better way to learn data science 🧠
And to land a job πŸ’Ό

Here it is ↓
Stop taking courses.

No more passive reading.
No more "easy" paths, that do not stand you out from the crowd.

Instead, focus on BUILDING something you care about.
For example, if you are into Natural Language Processing, you can set yourself this goal:

"I want to build an AI tool to write comedy".

The world definitely needs more laughs πŸ˜„
Starting from the end goal

aka "let's write a writing tool for comedy"

puts your mind in a "problem-solving" mode 🧠

So you start asking yourself the right questions.
#question 1: How can I generate text using a Machine Learning model?

And you discover the fantastic zoologic of models at HuggingFace, and their Python library `transfomers`.

Boom.

huggingface.co/tasks/text-gen…
#question 2: "These models generate text, but not comedy. How can I make my model produce funny text?"

And you discover one of the most useful techniques in modern AI: fine-tuning.

For example, this hands-on tutorial goes straight to the point:
colab.research.google.com/github/hugging…
#question 3: "The tutorial is very cool, but to use it I need to bring my own text data of jokes. Where can I find that?"

And you discover this open-source dataset of English jokes put together by @taivopungas πŸ™
github.com/taivop/joke-da…

Boom. Boom. Boom.
You've got the 3 main ingredients to build your product:

β†’ A powerful model to generate general text in Python
β†’ A hands-on tutorial to fine-tune it for your goal.
β†’ And the dataset of jokes you need for the fine-tuning.

And now the last step...
#question 4: "How do I make my tool accessible to the world?"

You have at least a couple of options here:

β†’ A serverless deployment in the Cloud (like AWS Lambda)
β†’ A completely managed solution like Streamlit Cloud, which is FREE by the way.

BOOM!
Building a TOP project is THE way to land a Data Science job.

This is how I help my mentees

@fdmatoz
@curiovana
@HakeemAmir
@Niston_94

break into professional Data Science.

Wanna build ONE project to change your life?

Apply ↓
datamachines.xyz/data-science-m…
Wanna break into professional Data Science?
β†’ Follow me @paulabartabajo_

Wanna help?
Like/Retweet the first tweet below to spread the wisdom πŸ™

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Pau Labarta Bajo

Pau Labarta Bajo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @paulabartabajo_

Oct 25
Wanna become an NLP engineer?

Stop taking online courses.
Build something instead πŸ—οΈ

Here is a project you can build πŸ‘©πŸ½β€πŸ’»πŸ‘¨β€πŸ’»β†“β†“β†“
Reading blog posts about multi-billion-parameter Language Models is very cool.

However, building real-world NLP products from these models is where the real business value is. And this is what companies look for in the job market.

So, here is a PRO project you can build ↓
"An app that recommends what ML paper to read"

Imagine an app where you can describe what paper you are interested in reading today. For example

πŸ§‘β€πŸ”¬: "I want a paper about Transformers in Computer Vision"

Stop imagining. Instead, build this system ↓ in 4 steps
Read 10 tweets
Oct 18
Wanna learn enough git to be a data scientist?

A hands-on tutorial in 10 steps πŸ‘©πŸ½β€πŸ’»πŸ‘¨β€πŸ’»β†“β†“β†“
#1 Create your project folder and cd into it Image
#2 Create a readme.md file.
This is the first thing anyone visiting your repository will see.
You better have one. And you better make it pretty. Image
Read 14 tweets
Oct 5
Love using Jupyter notebooks, but after a while, they look like a total mess? πŸ˜΅β€πŸ’«

What if I told you there is a quick, simple, and efficient way to make them tidy and shiny?

These 3 tips will help you keep your notebooks clean and boost your productivity πŸš€β†“
Jupyter notebooks are the most popular environment to develop Machine Learning models.

They are the faster way to
β†’ add code
β†’ fix code
β†’ re-run code

for your Machine Learning project.

However, they quickly turn into a mess...

... unless you follow these 3 tips.
Tip #1. Encapsulate common code as functions.

If you do not encapsulate your code, you are doomed to duplicate it.

And code duplication is both a productivity killer and an endless source of bugs.

The solution:
β†’ Define functionality ONCE.
β†’ Call it as many times as you need
Read 11 tweets
Oct 4
"The best way to improve a Machine Learning model is to add more features to the input data."

You have read this sentence 100x times.

But is it true for real-world projects? πŸ€” ↓↓↓
β†’ It is definitely true if you are trying to win a Kaggle competition. Adding more features can only help you in this context.

β†’ However, if you are working on a real-world ML project, adding features is no "free lunch" 🍜
One of the hardest problems in real-world ML projects is to prepare and send the input data the model needs to make predictions, once deployed.

aka "How to serve the input features the model needs to work well"?
Read 17 tweets
Sep 13
How do you know if your Machine Learning model is "good enough", in a real-world project? πŸ€”

Let me explain ↓
A Machine Learning model in the real world is either:

βœ… Good enough. In this case, the model is deployed, its predictions are used, and add value to the business every day.

❌ Not good enough. You need to improve your model, and if that is not possible, the project is canceled
When developing a model, you use standard metrics to measure how good it is.

Examples:
β†’ Mean Square Error β†’ regression problems.
β†’ Accuracy β†’ classification problems.

These are the metrics you care about in Kaggle.

However, in real-world projects, these are insufficient.
Read 14 tweets
Sep 1
Every aspiring data scientist I talk to is overwhelmed by the colossal amount of online courses to choose from 🀯

My solution to this problem ↓
Learning is about connecting the dots.

However, it feels like there are too many dots to connect when learning data science.

Too many courses...
Too many blog posts...
Too many technologies...

Solution: You need to change the way you learn.
As a professional data scientist, you are expected to be a problem-solver for the company or institution you work for.

You need to be good at building data science products that solve business problems.

And for that, you don't need to be an expert in Python, for example.
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(