๐Ÿš€ If you are starting with machine learning / deep learning and get a new dataset to work on, either on kaggle or in real-world or just for fun. There are a few things you must always take care of to squeeze the most out of your model and make it awesome: โฌ‡๏ธโฌ‡๏ธโฌ‡๏ธ
1/6
๐Ÿ”น Look at the data carefully. Do EDA.
๐Ÿ”น Look at the targets. See how they are distributed and what kind of problem this is.
๐Ÿ”น Choose the right metric to evaluate your models
2/6
๐Ÿ”น Split the data into folds. You can use this for cross validation or for hold out based validation
๐Ÿ”น Build a first basic model. This is going to be your baseline.
๐Ÿ”น Now try to improve on the baseline by adding new features
3/6
๐Ÿ”น To add new features, go back to data. Look at the EDA. That's why its quite important
๐Ÿ”น When you think you have reached a limit with feature engineering, try different models
๐Ÿ”น Keep log of all the scores, features and models
4/6
๐Ÿ”น When you think you have reached a limit with different models, try feature selection
๐Ÿ”น Done with feature engineering and feature selection? You will realize that by following above steps, you have also chosen a few best models that work well with your data
5/6
๐Ÿ”น Now its time to do hyperparameter optimization and squeeze the last few drops from your best models
๐Ÿ”น Wrap everything in docker, so that it's reproducible.
๐Ÿ”น Build a simple api endpoint or a fully-fledged web application to serve your model and brag about it ;) 6/6

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with abhishek

abhishek Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @abhi1thakur

8 Feb
Not surprised that none of the nay-sayers were not able to respond. That's what happens when you start accusing and abusing someone without understanding the context. Here are some solutions in this thread ๐Ÿ”ฝ๐Ÿ”ฝ๐Ÿ”ฝ 1/7
Here is a solution using pandas. Time taken: 191.77s
2/7
Here is another solution using pandas. Time taken: 188.21s
3/ 7
Read 7 tweets
8 Feb
So, people who called me names here is a test for you. You need to use python.

- You have 100k CSVs in a folder.
- Read all files in the folder
- Combine them in a single CSV
- Save the combined file for feature engineering using pandas
- All files share the same header
1/4
where do I find 100k CSVs in a folder? Well, in many scenarios and real-life situations. I have made it easy for you: github.com/abhishekkrthakโ€ฆ

Those who called me names must use pandas.
Those who are willing to learn, scroll below.

2/4
Using pandas, in a simple way, took 120 seconds to do this for me. Using pure python took 5.5 seconds, using pypy took 3.8 seconds. That's why it's important to learn the basics too.

After that, ill use pandas for feature engineering. You don't need a bazooka to kill a fly

3/4
Read 5 tweets
7 Feb
Stacking in machine learning ๐Ÿ”ฝ 1/4
- Divide the training data into folds.
- Train a bunch of models: M1, M2.....Mn.
- Create full training predictions (using out of fold training) and test predictions using all these models. 2/4
- Till here it is Level โ€“ 1 (L1).
- Use the fold predictions from these models as features to another model.
This is now a Level โ€“ 2 (L2) model.
- Use the same folds as before to train this L2 model. 3/4
Read 4 tweets
7 Feb
Have you had troubles or having troubles arranging your machine learning projects? This thread should give you some idea on how to arrange machine learning / deep learning projects. See the folder structure: 1/6 ๐Ÿ”ฝ
input/: This folder consists of all the input files and data for your machine learning project. If you are working on NLP projects, you can keep your embeddings here. If you are working on image projects, all images go to a subfolder inside this folder. 2/6
src/: We will keep all the python scripts associated with the project here. If I talk about a python script, i.e. any *.py file, it is stored in the src folder. 3/6
Read 6 tweets
11 Jan
In this thread, I will tell you how to learn python for data science in 1 hour ๐Ÿ‘‡ 1/N
It's not possible 2/N
It's not possible 3/N
Read 10 tweets
11 Jan
In this thread, I will show you how to train a deep learning based sentiment classification model using BERT. First, you need the IMDb dataset, you can grab it from kaggle.com/lakshmi25npathโ€ฆ. Save this as imdb.csv. ๐Ÿ‘‡
1/5
Now start coding!
Import everything required 2/5
Create the dataset class: 3/5
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!