Day 13 #31DaysofML

⚖️ How to deal with imbalanced datasets?⚖️
Most real-world datasets are not perfectly balanced. If 90% of your dataset belongs to one class, & only 10% to the other, how can you prevent your model from predicting the majority class 90% of the time?

🧵 👇
🐱🐱🐱🐱🐱🐱🐱🐱🐱🐶 (90:10)
💳 💳 💳 💳 💳 💳 💳 💳 💳 ⚠️ (90:10)
There can be many reasons for imbalanced data. First step is to see if it's possible to collect more data. If you're working with all the data that's available, these 👇 techniques can help

#31DaysofML 2/7
Here are 3 techniques for addressing data imbalance. You can use just one of these or all of them together:
⚖️ Downsampling
⚖️ Upsampling
⚖️ Weighted classes

#31DaysofML 3/7
📌 Downsampling 📌
This technique removes a random subset of the majority class from your data
Original dataset: 🐱🐱🐱🐱🐱🐱🐱🐱🐱🐱 🐶
Downsampling: 🐱🐱🐱🐱 🐶
Note: requires starting with a big enough dataset to not lose any meaning present in the data

#31DaysofML 4/7
📌 Upsampling 📌
In this technique you generate new examples for your minority class that are in a similar feature space as existing examples
Original dataset: ⚽️⚽️⚽️⚽️⚽️⚽️⚽️ 🏀
Upsampling: ⚽️⚽️⚽️⚽️⚽️⚽️⚽️🏀🏀🏀🏀🏀

#31DaysofML 5/7
📌 Weighted classes📌
In this technique you tell your model to give more weight to specific examples in your dataset
Original dataset: 🍎🍎🍎🍎🍎🍎🍎 🍋
Pay more attention to 🍋 minority class

#31DaysofML 6/7
This tip comes from my ML mentor @SRobTweets who recently co-authored a book on Machine Learning Design Patterns along with @lak_gcp & Michael Munn

Here's the link to the book 👉 amzn.to/3d63GO4
SMOTE paper 👉 arxiv.org/abs/1106.1813

#31DaysofML 7/7

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Priyanka Vergadia

Priyanka Vergadia Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @pvergadia

15 Feb
Day 14 #31DaysofML

🤔 How to pick the right #GoogleCloud #MachineLearning tool for your application?

Answer these questions
❓ What's your teams ML expertise?
❓ How much control/abstraction do you need?
❓ Would you like to handle the infrastructure components?

🧵 👇
@SRobTweets created this pyramid to explain the idea.
As you move up the pyramid, less ML expertise is required, and you also don’t need to worry as much about the infrastructure behind your model.

To lear more watch this video 👉

#31DaysofML 2/10
@SRobTweets If you’re using Open source ML frameworks (#TensorFlow) to build the models, you get the flexibility of moving your workloads across different development & deployment environments. But, you need to manage all the infrastructure yourself for training & serving

#31DaysofML 3/10
Read 10 tweets
2 Feb
🏹 Let's go Day 1 of #31DaysofML

💡What is #MachineLearning? 💡

ML = Using data to answer questions!
📌 Using data = Training
📌 Answer questions = Predictions

Let's keep going... 🧵👇
2/4 What are the 7 steps in Machine Learning?

1️⃣ Collect Data
2️⃣ Prepare Data
3️⃣ Choose a Model
4️⃣ Train the Model
5️⃣ Evaluate the Model
6️⃣ Parameter Tuning
7️⃣ Make Predictions

For more @yufengg amazing video 👉bit.ly/3j3j2ne

#31DaysofML
@yufengg 1️⃣ Collect Data

📌Quantity & quality of your data dictate how accurate our model is
📌The outcome of this step is usually a table with some values (features)
📌 If you want to use pre-collected data - get it from sources such as Kaggle or BigQuery Public Datasets

#31DaysofML
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!